[U-Boot-Users] RFC: Booting the Linux/ppc64 kernel without Open Firmware HOWTO

Hi !
Here's the very first draft of my HOWTO about booting the linux/ppc64 kernel without open firmware. It's still incomplete, the main chapter describing which nodes & properties are required and their format is still missing (though it will basically be a subset of the Open Firmware specification & bindings). The format of the flattened device-tree is documented.
It's a first draft, so please, don't be too harsh :) Comments are welcome.
Booting the Linux/ppc64 kernel without Open Firmware ----------------------------------------------------
(c) 2005 Benjamin Herrenschmidt benh@kernel.crashing.org, IBM Corp.
May 18, 2005: Rev 0.1 - Initial draft, no chapter III yet.
I- Introduction ===============
During the recent developpements of the Linux/ppc64 kernel, and more specifically, the addition of new platform types outside of the old IBM pSeries/iSeries pair, it was decided to enforce some strict rules regarding the kernel entry and bootloader <-> kernel interfaces, in order to avoid the degeneration that has become the ppc32 kernel entry point and the way a new platform should be added to the kernel. The legacy iSeries platform breaks those rules as it predates this scheme, but no new board support will be accepted in the main tree that doesn't follows them properly.
1) Entry point --------------
There is one and one single entry point to the kernel, at the start of the kernel image. That entry point support two calling conventions:
a) Boot from Open Firmware. If your firmware is compatible with Open Firmware (IEEE 1275) or provides an OF compatible client interface API (support for "interpret" callback of forth words isn't required), you can enter the kernel with:
r5 : OF callback pointer as defined by IEEE 1275 bindings to powerpc. Only the 32 bits client interface is currently supported
r3, r4 : address & lenght of an initrd if any or 0
MMU is either on or off, the kernel will run the trampoline located in arch/ppc64/kernel/prom_init.c to extract the device-tree and other informations from open firmware and build a flattened device-tree as described in b). prom_init() will then re-enter the kernel using the second method. This trampoline code runs in the context of the firmware, which is supposed to handle all exceptions during that time.
b) Direct entry with a flattened device-tree block. This entry point is called by a) after the OF trampoline and can also be called directly by a bootloader that does not support the Open Firmware client interface. It is also used by "kexec" to implement "hot" booting of a new kernel from a previous running one. This method is what I will describe in more details in this document, as method a) is simply standard Open Firmware, and thus should be implemented according to the various standard documents defining it and it's binding to the PowerPC platform. The entry point definition then becomes:
r3 : physical pointer to the device-tree block (defined in chapter II)
r4 : physical pointer to the kernel itself. This is used by the assembly code to properly disable the MMU in case you are entering the kernel with MMU enabled and a non-1:1 mapping.
r5 : NULL (as to differenciate with method a)
2) Board support ----------------
Board supports (platforms) are not exclusive config options. An arbitrary set of board supports can be built in a single kernel image. The kernel will "known" what set of functions to use for a given platform based on the content of the device-tree. Thus, you should:
a) add your platform support as a _boolean_ option in arch/ppc64/Kconfig, following the example of PPC_PSERIES, PPC_PMAC and PPC_MAPLE. The later is probably a good example of a board support to start from.
b) create your main platform file as "arch/ppc64/kernel/myboard_setup.c" and add it to the Makefile under the condition of your CONFIG_ option. This file will define a structure of type "ppc_md" containing the various callbacks that the generic code will use to get to your platform specific code
c) Add a reference to your "ppc_md" structure in the "machines" table in arch/ppc64/kernel/setup.c
d) request and get assigned a platform number (see PLATFORM_* constants in include/asm-ppc64/processor.h
I will describe later the boot process and various callbacks that your platform should implement.
II - The DT block format ===========================
This chapter defines the actual format of the flattened device-tree passed to the kernel. The actual content of it and kernel requirements are described later. You can find example of code manipulating that format in various places, including arch/ppc64/kernel/prom_init.c which will generate a flattened device-tree from the Open Firmware representation, or the fs2dt utility which is part of the kexec tools which will generate one from a filesystem representation. It is expected that a bootloader like uboot provides a bit more support, that will be discussed later as well.
1) Header ---------
The kernel is entered with r3 pointing to an area of memory that is roughtly described in include/asm-ppc64/prom.h by the structure boot_param_header:
struct boot_param_header { u32 magic; /* magic word OF_DT_HEADER */ u32 totalsize; /* total size of DT block */ u32 off_dt_struct; /* offset to structure */ u32 off_dt_strings; /* offset to strings */ u32 off_mem_rsvmap; /* offset to memory reserve map */ u32 version; /* format version */ u32 last_comp_version; /* last compatible version */ /* version 2 fields below */ u32 boot_cpuid_phys; /* Which physical CPU id we're booting on */ };
Along with the constants:
/* Definitions used by the flattened device tree */ #define OF_DT_HEADER 0xd00dfeed /* 4: version, 4: total size */ #define OF_DT_BEGIN_NODE 0x1 /* Start node: full name */ #define OF_DT_END_NODE 0x2 /* End node */ #define OF_DT_PROP 0x3 /* Property: name off, size, content */ #define OF_DT_END 0x9
All values in this header are in big endian format, the various fields in this header are defined more precisely below. All "offsets" values are in bytes from the start of the header, that is from r3 value.
- magic
This is a magic value that "marks" the beginning of the device-tree block header. It contains the value 0xd00dfeed and is defined by the constant OF_DT_HEADER
- totalsize
This is the total size of the DT block including the header. The "DT" block should enclose all data structures defined in this chapter (who are pointed to by offsets in this header). That is, the device-tree structure, strings, and the memory reserve map.
- off_dt_struct
This is an offset from the beginning of the header to the start of the "structure" part the device tree. (see 2) device tree)
- off_dt_strings
This is an offset from the beginning of the header to the start of the "strings" part of the device-tree
- off_mem_rsvmap
This is an offset from the beginning of the header to the start of the reserved memory map. This map is a list of pairs of 64 bits integers. Each pair is a physical address and a size. The list is terminated by an entry of size 0. This map provides the kernel with a list of physical memory areas that are "reserved" and thus not to be used for memory allocations, especially during early initialisation. The kernel needs to allocate memory during boot for things like un-flattening the device-tree, allocating an MMU hash table, etc... Those allocations must be done in such a way to avoid overriding critical things like, on Open Firmware capable machines, the RTAS instance, or on some pSeries, the TCE tables used for the iommu. Typically, the reserve map should contain _at least_ this DT block itself (header,total_size). If you are passing an initrd to the kernel, you should reserve it as well. You do not need to reserve the kernel image itself. The map should be 64 bits aligned.
- version
This is the version of this structure. Version 1 stops here. Version 2 adds an additional field boot_cpuid_phys. You should always generate a structure of the highest version defined at the time of your implementation. That is version 2.
- last_comp_version
Last compatible version. This indicates down to what version of the DT block you are backward compatible with. For example, version 2 is backward compatible with version 1 (that is, a kernel build for version 1 will be able to boot with a version 2 format). You should put a 1 in this field unless a new incompatible version of the DT block is defined.
- boot_cpuid_phys
This field only exist on version 2 headers. It indicate which physical CPU ID is calling the kernel entry point. This is used, among others, by kexec. If you are on an SMP system, this value should match the content of the "reg" property of the CPU node in the device-tree corresponding to the CPU calling the kernel entry point (see further chapters for more informations on the required device-tree contents)
So the typical layout of a DT block (though the various parts don't need to be in that order) looks like (addresses go from top to bottom):
------------------------------ r3 -> | struct boot_param_header | ------------------------------ | (alignment gap) (*) | ------------------------------ | memory reserve map | ------------------------------ | (alignment gap) | ------------------------------ | | | device-tree structure | | | ------------------------------ | (alignment gap) | ------------------------------ | | | device-tree strings | | | -----> ------------------------------ | | --- (r3 + totalsize)
(*) The alignment gaps are not necessarily present, their presence and size are dependent on the various alignment requirements of the individual data blocks.
2) Device tree generalities ---------------------------
This device-tree itself is separated in two different blocks, a structure block and a strings block. Both need to be page aligned.
First, let's quickly describe the device-tree concept before detailing the storage format. This chapter does _not_ describe the detail of the required types of nodes & properties for the kernel, this is done later in chapter III.
The device-tree layout is strongly inherited from the definition of the Open Firmware IEEE 1275 device-tree. It's basically a tree of nodes, each node having two or more named properties. A property can have a value or not.
It is a tree, so each node has one and only one parent except for the root node who has no parent.
A node has 2 names. The actual node name is contained in a property of type "name" in the node property list whose value is a zero terminated string and is mandatory. There is also a "unit name" that is used to differenciate nodes with the same name at the same level, it is usually made of the node name's, the "@" sign, and a "unit address", which definition is specific to the bus type the node sits on. The unit name doesn't exist as a property per-se but is included in the device-tree structure. It is typically used to represent "path" in the device-tree. More details about these will be provided later. The kernel ppc64 generic code does not make any formal use of the unit address though (though some board support code may do) so the only real requirement here for the unit address is to ensure uniqueness of the node unit name at a given level. Nodes with no notion of address and no possible sibling of the same name (like /memory or /cpus) may ommit the unit address in the context of this specification, or use the "@0" default unit address. The unit name is used to define a node "full path", which is the concatenation of all parent nodes unit names separated with "/".
The root node is defined as beeing named "device-tree" and has no unit address (no @ symbol followed by a unit address). When manipulating device-tree "path", the root of the tree is generally represented by a simple slash sign "/".
Every node who actually represents an actual device (that is who isn't only a virtual "container" for more nodes, like "/cpus" is) is also required to have a "device_type" property indicating the type of node
Finally, every node is required to have a "linux,phandle" property. Real open firmware implementations don't provide it as it's generated on the fly by the prom_init.c trampoline from the Open Firmware "phandle". Implementations providing a flattened device-tree directly should provide this property. This propery is a 32 bits value that uniquely identify a node. You are free to use whatever values or system of values, internal pointers, or whatever to genrate these, the only requirement is that every single node of the tree you are passing to the kernel has a unique value in this property.
This can be used in some cases for nodes to reference other nodes.
Here is an example of a simple device-tree. In this example, a "o" designates a node followed by the node unit name. Properties are presented with their name followed by their content. "content" represent an ASCII string (zero terminated) value, while <content> represent a 32 bits hexadecimal value. The various nodes in this example will be discusse in a later chapter. At this point, it is only meant to give you a idea of what a device-tree looks like
/ o device-tree |- name = "device-tree" |- model = "MyBoardName" |- compatible = "MyBoardFamilyName" |- #address-cells = <2> |- #size-cells = <2> |- linux,phandle = <0> | o cpus | | - name = "cpus" | | - linux,phandle = <1> | | | o PowerPC,970@0 | |- name = "PowerPC,970" | |- device_type = "cpu" | |- reg = <0> | |- clock-frequency = <5f5e1000> | |- linux,boot-cpu | |- linux,phandle = <2> | o memory@0 | |- name = "memory" | |- device_type = "memory" | |- reg = <00000000 00000000 00000000 20000000> | |- linux,phandle = <3> | o chosen |- name = "chosen" |- bootargs = "root=/dev/sda2" |- linux,platform = <00000600> |- linux,phandle = <4>
This tree is an example of a minimal tree. It pretty much contains the minimal set of required nodes and properties to boot a linux kernel, that is some basic model informations at the root, the CPUs, the physical memory layout, and misc informations passed through /chosen like in this example, the platform type (mandatory) and the kernel command line arguments (optional).
The /cpus/PowerPC,970@0/linux,boot-cpu property is an example of a property without a value. All other properties have a value. The signification of the #address-cells and #size-cells properties will be explained in chapter IV which defines precisely the required nodes and properties and their content.
3) Device tree "structure" block
The structure of the device tree is a linearized tree structure. The "OF_DT_BEGIN_NODE" token starts a new node, and the "OF_DT_END" ends that node definition. Child nodes are simply defined before "OF_DT_END" (that is nodes within the node). A 'token' is a 32 bits value.
Here's the basic structure of a single node:
* token OF_DT_BEGIN_NODE (that is 0x00000001) * node full path as a zero terminated string * [align gap to next 4 bytes boundary] * for each property: * token OF_DT_PROP (that is 0x00000003) * 32 bits value of property value size in bytes (or 0 of no value) * 32 bits value of offset in string block of property name * [align gap to either next 4 bytes boundary if the property value size is less or equal to 4 bytes, or to next 8 bytes boundary if the property value size is larger than 4 bytes] * property value data if any * [align gap to next 4 bytes boundary] * [child nodes if any] * token OF_DT_END (that is 0x00000002)
So the node content can be summmarised as a start token, a full path, a list of properties, a list of child node and an end token. Every child node is a full node structure itself as defined above
4) Device tree 'strings" block
In order to save space, property names, which are generally redundant, are stored separately in the "strings" block. This block is simply the whole bunch of zero terminated strings for all property names concatenated together. The device-tree property definitions in the structure block will contain offset values from the beginning of the strings block.
III - Required content of the device tree =========================================
< to be written >
IV - Recommendation for a bootloader ====================================
Here are some various ideas/recommendations that have been proposed while all this has been defined and implemented.
- It should be possible to write a parser that turns an ASCII representation of a device-tree (or even XML though I find that less readable) into a device-tree block. This would allow to basically build the device-tree structure and strings "blobs" at bootloader build time, and have the bootloader just pass-them as-is to the kernel. In fact, the device-tree blob could be then separate from the bootloader itself, an be placed in a separate portion of the flash that can be "personalized" for different board types by flashing a different device-tree
- A very The bootloader may want to be able to use the device-tree itself and may want to manipulate it (to add/edit some properties, like physical memory size or kernel arguments). At this point, 2 choices can be made. Either the bootloader works directly on the flattened format, or the bootloader has it's own internal tree representation with pointers (similar to the kernel one) and re-flattens the tree when booting the kernel. The former is a bit more difficult to edit/modify, the later requires probably a bit more code to handle the tree structure. Note that the structure format has been designed so it's relatively easy to "insert" properties or nodes or delete them by just memmovin'g things around. It contains no internal offsets or pointers for this purpose.
- An example of code for iterating nodes & retreiving properties directly from the flattened tree format can be found in the kernel file arch/ppc64/kernel/prom.c, look at scan_flat_dt() function, it's usage in early_init_devtree(), and the corresponding various early_init_dt_scan_*() callbacks. That code can be re-used in a GPL device-tree, and as the author of that code, I would be happy do discuss possible free licencing to any vendor who wishes to integrate all or part of this code into a non-GPL bootloader.

Ben,
On Wed, 18 May 2005, Benjamin Herrenschmidt wrote:
Here's the very first draft of my HOWTO about booting the linux/ppc64 kernel without open firmware. It's still incomplete, the main chapter
^^^^^^^^^^^^^^^^^^^^^ One could argue whether the full-blown emulation of an OF device tree may really be called this.... ;-)
b) Direct entry with a flattened device-tree block. This entry point is called by a) after the OF trampoline and can also be called directly by a bootloader that does not support the Open Firmware client interface. It is also used by "kexec" to
For OF based systems, what you outline definitely makes an awful lot of sense.
For others I wonder what the costs of this are in terms of the memory footprint (both RAM and ROM). Are there reference implementations in existence?
Regards, Marius

On Wed, 2005-05-18 at 10:12 +0200, Marius Groeger wrote:
Ben,
On Wed, 18 May 2005, Benjamin Herrenschmidt wrote:
Here's the very first draft of my HOWTO about booting the linux/ppc64 kernel without open firmware. It's still incomplete, the main chapter
^^^^^^^^^^^^^^^^^^^^^
One could argue whether the full-blown emulation of an OF device tree may really be called this.... ;-)
You must be kidding :)
Honestly, a device tree is small and rather simple to layout, and would fix most of the issues with piling up crap like incompatible boot_info structures and that sort of thing that plague the ppc32 kernel.
A full blown implementation of OF is a lot bigger. It requires at least 3 different interfaces (the user interface, the fcode interface, the client interface), along with all the bits & pieces to get a full runtime environment.
b) Direct entry with a flattened device-tree block. This entry point is called by a) after the OF trampoline and can also be called directly by a bootloader that does not support the Open Firmware client interface. It is also used by "kexec" to
For OF based systems, what you outline definitely makes an awful lot of sense.
How so ? OF based system just implement the OF interface...
For others I wonder what the costs of this are in terms of the memory footprint (both RAM and ROM). Are there reference implementations in existence?
You may not have noticed (well, I haven't filled part III yet so it may not be clear), but I'm only making a very small subset of the device-tree mandatory, though I do encourage people to provide an as complete as possible.
For example, I will definitely not require the bootloader to provide a full tree of PCI devices, only host bridges, in order to get interrupt routing and resource mapping. However, I encourage people to put things like on-chip devices in there, it makes everything much more flexible.
Regarding the cost, well, the device-tree itself is fairly small, maybe a couple of pages for a minimum one. As I wrote, embedded boards can decide to have it built at booloader build time, and simply embedded as a blob in the firmware and passed along to the kernel, that is 0 firmware code. However, it would be simple to add minimum capabilities to the firmware for editing/adding properties (for things like memory size or kernel command line).
I wonder sometimes why people are so "afraid" of the device-tree concept... it's really simple, does not require that much code, and makes everything so much more flexible in the long run.
Ben.

On Thu, 19 May 2005, Benjamin Herrenschmidt wrote:
Here's the very first draft of my HOWTO about booting the linux/ppc64 kernel without open firmware. It's still incomplete, the main chapter
^^^^^^^^^^^^^^^^^^^^^
One could argue whether the full-blown emulation of an OF device tree may really be called this.... ;-)
You must be kidding :)
Honestly, a device tree is small and rather simple to layout, and would fix most of the issues with piling up crap like incompatible boot_info structures and that sort of thing that plague the ppc32 kernel.
Yes, I know, and I *was* kidding. :-) What I was trying to hint, at, really, was that there is just a bit more than resemblance to what real OF based systems will provide for a device tree. And rightly so, no need to it difficult for those.
For OF based systems, what you outline definitely makes an awful lot of sense.
How so ? OF based system just implement the OF interface...
Er, yes, and that is why it makes sense to design it that way. Maybe striking out the word "awful" makes my statement clearer :-)
Regarding the cost, well, the device-tree itself is fairly small, maybe a couple of pages for a minimum one. As I wrote, embedded boards can
Without knowing the size of the code required for this, it would still mean an increase by a couple of hundred percent for the boot information.
I wonder sometimes why people are so "afraid" of the device-tree concept... it's really simple, does not require that much code, and makes everything so much more flexible in the long run.
Oh, don't get me wrong: I'm not against the device tree per se, I was just pondering a little on your draft, according to the "RFC" bit in the subject. Actually I welcome your efforts a lot, since I, too, suffered from the mess we a currently in. So, by all means, please do go on!
Cheers, Marius

Without knowing the size of the code required for this, it would still mean an increase by a couple of hundred percent for the boot information.
Well, if you build the device-tree blob at bootloader build time (you can then embed it in your bootloader or maybe just put it somewhere in flash), there is little code involved, basically passing a pointer to it to the kernel. Now, if you mean the kernel code, oh well, have you seen how big a ppc64 kernel is anyway ? :)
I would expect something like uboot to be a bit more smart though and provide optionally some functions to add nodes/properties, but heh, we'll see. I'll try to provide example code after I'm done with the spec part.
Ben.

In message 1116498144.918.97.camel@gaston you wrote:
Without knowing the size of the code required for this, it would still mean an increase by a couple of hundred percent for the boot information.
Well, if you build the device-tree blob at bootloader build time (you can then embed it in your bootloader or maybe just put it somewhere in flash), there is little code involved, basically passing a pointer to it to the kernel. Now, if you mean the kernel code, oh well, have you seen how big a ppc64 kernel is anyway ? :)
Marius was talking about the amount of data passed to the kernel.
And yes, we are aware how big a ppc64 kernel is. One might argue that you need a 64 bit kernel only for big systems, so resources are cheap. On the other hand, we are also aware how big the 2.6 kernel is compared against 2.4, and how it suffers performancewise.
My concern is that just adding a few kB of code here and there and passing a bit more data from A to B and using ASCII representation for the data and all of that will result in elegant and easily maintainable code on one side, but to even bigger memory footprints for boot loader and kernel and longer boot times on the other side, too. We have seen before how this works.
A few tens or hundreds of milliseconds of boot time may not mean anything on a fast 64 bit machine which will spend ages anyway while scanning a lot of SCSI busses and all that, but it will *hurt* on many embedded systems.
I would expect something like uboot to be a bit more smart though and provide optionally some functions to add nodes/properties, but heh, we'll see. I'll try to provide example code after I'm done with the spec part.
It's not only an issue of being smart enough. It has also a lot do to with hardware restrictions. If you have a product that sells several 1e4 or 1e5 units per year which now works with just 4 MB of flash for boot loader and Linux kernel and application code you have hard times to explain that the next software generation will need bigger (and more expensive) flashes just because of using more elegant code.
Yes, small *is* beautiful.
We had this discussion before, several times. There once was a proposal by Mark A. Greer (see discussion on the linuxppc-embedded mailing list that started as "EV-64260-BP & GT64260 bi_recs" around March 19, 2002) which was elegant, flexible and lean. If it was not actually sad it could be funny that the general agreement will always end up to be the biggest and slowest of all possible solutioins.
But my biggest concern here on the U-Boot list is: U-Boot is not only for PowerPC systems. We should also keep an eye on what ARM and MIPS is doing... See my other posting.
Best regards,
Wolfgang Denk

On Thu, May 19, 2005 at 03:18:41PM +0200, Wolfgang Denk was heard to remark:
It's not only an issue of being smart enough. It has also a lot do to with hardware restrictions. If you have a product that sells several 1e4 or 1e5 units per year which now works with just 4 MB of flash for boot loader and Linux kernel and application code you have hard times to explain that the next software generation will need bigger (and more expensive) flashes just because of using more elegant code.
Yes, small *is* beautiful.
:-/ I was once very disatisfied with an earlier job I had because the boss kept trying to make me use a "rabbitcore" which had only 1MB for everything, and there was no way I'd be able to fit Linux into that.
Rabbitcore ran some tiny thing called rabbitOS, but the tools were all on windows. :( This was only in 2003, and I still see adds for the rabbit in magazines.
--linas

On May 19, 2005, at 3:37 PM, Linas Vepstas wrote:
:-/ I was once very disatisfied with an earlier job I had because the boss kept trying to make me use a "rabbitcore" which had only 1MB for everything, and there was no way I'd be able to fit Linux into that.
People understand the trade off of the need for resources to get the features they want, which is why they choose Linux in the first place. Yes, sometimes people ask for what seems to be unreasonable in such products, but it also forces us to be clever about how we configure the systems.
The difficult trade off is when some states they get the same feature set with one particular piece of software as they do with Linux, but in much less space. The advantage of Linux is open source and no royalties, but many of the RTOS systems these days no longer have royalties, just a one time up front cost. When they weigh that against the extra cost of memory for Linux and the number of systems, the Linux "royalty" is more than the purchase of the competing OS. It's really happening that way today.
Thanks.
-- Dan

Marius was talking about the amount of data passed to the kernel.
A few Kb maybe... Current implementations always provide a full featured device-tree with pci devices so they aren't a good example (and I don't have numbers in mind at the moment). I'll try to get some later today. The property names are factored out (only one copy of a given name) to avoid bloat, the node format is very compact, A small device-tree would be only about a dozen node (the minimal is 5 nodes including the root) with only a few properties
And yes, we are aware how big a ppc64 kernel is. One might argue that you need a 64 bit kernel only for big systems, so resources are cheap. On the other hand, we are also aware how big the 2.6 kernel is compared against 2.4, and how it suffers performancewise.
I wouldn't say it sufferred performance wise on all architectures. Small embedded CPUs may have sufferred (mostly because of larger memory footprint impact on small TLBs), but ppc64 is definitely not something you ever want to use with a 2.4 kernel, and I would expect 2.6 to be faster on 6xx/7xx/7xxx type CPUs as well.
My concern is that just adding a few kB of code here and there and passing a bit more data from A to B and using ASCII representation for the data and all of that will result in elegant and easily maintainable code on one side, but to even bigger memory footprints for boot loader and kernel and longer boot times on the other side, too. We have seen before how this works.
I don't think it will have any significant impact on the boot time. Not at all. In fact, I'm not even sure the code would be that much bigger neither. For example, all the code needed to declare all the device-specific platform devices used in some case would be _replaced_ by a generic routine that declares a device based on the device-tree data, that sort of thing. I honestly cannot tell what kind of bloat is to be expected, but I really don't think it will be relevant.
Even the code for iterating the fully expanded device-tree & properties in the kernel isn't big, but as I wrote earlier, a non-ppc architecture wanting to use that proposal may want to work directly on the flattened tree.
I REALLY think people are over-estimating the size & complexity of the device-tree.
A few tens or hundreds of milliseconds of boot time may not mean anything on a fast 64 bit machine which will spend ages anyway while scanning a lot of SCSI busses and all that, but it will *hurt* on many embedded systems.
I wouldn't even expect that much.
I would expect something like uboot to be a bit more smart though and provide optionally some functions to add nodes/properties, but heh, we'll see. I'll try to provide example code after I'm done with the spec part.
It's not only an issue of being smart enough. It has also a lot do to with hardware restrictions. If you have a product that sells several 1e4 or 1e5 units per year which now works with just 4 MB of flash for boot loader and Linux kernel and application code you have hard times to explain that the next software generation will need bigger (and more expensive) flashes just because of using more elegant code.
Yes, small *is* beautiful.
Did you read the "optional" above ? Let me repeat _AGAIN_ here: the bootloader doesn't _need_ ANY code to deal with the device-tree if you decide to just build the blob once for all, and embed it "as is". However, not everybody is fighting after 10 bytes of flash, and thus it would be useful if optionally, uboot could provide the machine specific code with functions to do things like edit the memory or bootargs properties in there.
We had this discussion before, several times. There once was a proposal by Mark A. Greer (see discussion on the linuxppc-embedded mailing list that started as "EV-64260-BP & GT64260 bi_recs" around March 19, 2002) which was elegant, flexible and lean. If it was not actually sad it could be funny that the general agreement will always end up to be the biggest and slowest of all possible solutioins.
Fuck it ! This is not by far the biggest and slowest of all solutions, the tree format is on purpose very compact, it's not a few strings that will make that much of a difference, damn . Do you really want me to propose ACPI AML instead ?
Besides, I know the bi_rec stuff well as I propoed it in the first place, and nobody ever came to an agreement about that neither.
Face it, there will NOT be any other way that will be accepted upstream to boot a ppc64 kernel.
But my biggest concern here on the U-Boot list is: U-Boot is not only for PowerPC systems. We should also keep an eye on what ARM and MIPS is doing... See my other posting.
Sure, you are welcome to do so. I'm posting to this list because of Marvell's intend to use uboot as a bootloader for what appear to be the first ppc64 platform not to implement the OF command line interface.
Ben.

In message 1116541993.5153.22.camel@gaston you wrote:
Yes, small *is* beautiful.
Did you read the "optional" above ? Let me repeat _AGAIN_ here: the bootloader doesn't _need_ ANY code to deal with the device-tree if you decide to just build the blob once for all, and embed it "as is".
And the blob has a zero memory footprint or what?
Face it, there will NOT be any other way that will be accepted upstream to boot a ppc64 kernel.
Then let's just stop here. We're just wasting time if there is nothing to discuss any more.
Best regards,
Wolfgang Denk

On Fri, 2005-05-20 at 01:20 +0200, Wolfgang Denk wrote:
In message 1116541993.5153.22.camel@gaston you wrote:
Yes, small *is* beautiful.
Did you read the "optional" above ? Let me repeat _AGAIN_ here: the bootloader doesn't _need_ ANY code to deal with the device-tree if you decide to just build the blob once for all, and embed it "as is".
And the blob has a zero memory footprint or what?
Don't be ridiculous please. But definitely a small one.
Face it, there will NOT be any other way that will be accepted upstream to boot a ppc64 kernel.
Then let's just stop here. We're just wasting time if there is nothing to discuss any more.
You are welcome to discuss aspects of the content of the proposal.
Ben.

On Fri, 2005-05-20 at 08:33 +1000, Benjamin Herrenschmidt wrote:
Marius was talking about the amount of data passed to the kernel.
A few Kb maybe... Current implementations always provide a full featured device-tree with pci devices so they aren't a good example (and I don't have numbers in mind at the moment). I'll try to get some later today. The property names are factored out (only one copy of a given name) to avoid bloat, the node format is very compact, A small device-tree would be only about a dozen node (the minimal is 5 nodes including the root) with only a few properties
Ok, I got some numbers here. (I have removed the page alignment constraint for the DT block and the strings block in the "blob" passed in btw, I forgot to update v2)
- The minimal example device-tree given as an example in the document (exactly identical as the one in v2 of the document, which means it may even shrink more, see below) fits in a blob (complete with header) of 764 bytes.
- The complete device-tree of my PowerMac laptop (this is _huge_, Apple puts a _lot_ of stuff in there, way more than most embedded board even the most complex ones will ever need) fits into a 37k blob.
I will come up with more numbers soon including a good "average" example that is a Maple board with all the ISA/serial stuff (which is very useful to have there) but without the individual PCI devices.
On an additional note, I'm also rev'ing up the blob format with additional space savings in mind:
- Current version is 2. That's what the kernel recognises and what current kexec tools generate (well... they actually generate a version 1 but the difference is minor).
- Version 3 will be backward compatible and just adds a "string table size" field to the header to help kernel do better memory management with the flattened device-tree. kexec can implement it, older kernel will still understand the tree.
- Version 16 will not be backward compatible (will require kernel patches, but that should be ok for new board vendors) that allows more space saving. For this version, I'm planning the following changes for now:
* Relax some alignement restrictions (already did it for the numbers above) * Allow replacing of the full path string with only the "name@unit address" part, letting the kernel reconstruct the full path. With this change, the "name" property get be dropped in each node as well as in can be reconstructed by the kernel. There is a lot of redundency in the full path, so that should save a bit. Side effect is also to remove any name requirement for the root node. * Make the "linux,phandle" property optional. It will only be required for nodes that are referenced by another node using a phandle value (typically, nodes part of the interrupt tree).
With those chances, the example minimal tree may shrink down to about 600 bytes (gross estimate), which would mean an average tree with a few devices would be between one and 3Kb (gross estimate too).
Ben.

On Fri, 20 May 2005, Benjamin Herrenschmidt wrote:
Ok, I got some numbers here. (I have removed the page alignment
Thanks!
I think we'll just have to try all that out for ourselves. Simple boards will probably be at the lower end of your figures, which *should* be fine for most people.
How do you view this, though: couldn't it happen in the future, once the dev-tree has been widely established, that more and more drivers are converted to pull their properties off the tree, because it is so convenient? That *could* lead to rising expectations toward the firmware, and make what once was a small blob a big blob. Is it reasonable to assume drivers will #ifdef such behaviour?
Again, I'm just thinking here, no opinions yet. Well, if you want one: <opinion> Actually I always liked the idea of clever firmware, which usually knows the underlying hardware best. </opinion>
- The complete device-tree of my PowerMac laptop (this is _huge_, Apple
puts a _lot_ of stuff in there, way more than most embedded board even the most complex ones will ever need) fits into a 37k blob.
Don't underestimate embedded hardware. The MPC5554 has 286(!) selectable-priority interrupt sources... :-)
Cheers, Marius

On Fri, May 20, 2005 at 09:11:22AM +0200, Marius Groeger wrote:
On Fri, 20 May 2005, Benjamin Herrenschmidt wrote:
Ok, I got some numbers here. (I have removed the page alignment
Thanks!
I think we'll just have to try all that out for ourselves. Simple boards will probably be at the lower end of your figures, which *should* be fine for most people.
How do you view this, though: couldn't it happen in the future, once the dev-tree has been widely established, that more and more drivers are converted to pull their properties off the tree, because it is so convenient? That *could* lead to rising expectations toward the firmware, and make what once was a small blob a big blob. Is it reasonable to assume drivers will #ifdef such behaviour?
Bear in mind that if a driver chooses to take its information from the device tree, it's presumably because the code is simpler that way. Which means any such increase in the necessary device tree size is (at least partially) offset by a reduction in code size..

On Fri, 2005-05-20 at 09:11 +0200, Marius Groeger wrote:
On Fri, 20 May 2005, Benjamin Herrenschmidt wrote:
Ok, I got some numbers here. (I have removed the page alignment
Thanks!
I think we'll just have to try all that out for ourselves. Simple boards will probably be at the lower end of your figures, which *should* be fine for most people.
I expect so. Apple device-trees are really bloated :) I'm also rev'ing up the format to be even a bit more compact. On the other hand, the low figure I posted is a really very minimal tree with no device at all in it. It would be interesting to see what Marvell comes up with.
How do you view this, though: couldn't it happen in the future, once the dev-tree has been widely established, that more and more drivers are converted to pull their properties off the tree, because it is so convenient? That *could* lead to rising expectations toward the firmware, and make what once was a small blob a big blob. Is it reasonable to assume drivers will #ifdef such behaviour?
It's very difficult to foresee. But most modern busses like PCI, PCIX, PCIE etc... have their own "probing" facilities and such doesn't need devices to be present in the tree. (It is handy to put some there when ancialliary data has to be passed along, like MAC addresses, but that isn't mandatory at this point). It would be nice however that busses without those facilities (or pseudo busses), like on chip devices or superio chips expose their internals via the device-tree, but again, there is no need to bloat them with gazillion of properties. Just the basic to be identified, matched to a driver and address/ports/interrupts mappings. In fact, the device-tree "bloat" to expose those infos may well be less than the code bloat for hard-coding all possible combinations in the kernel, especially if you want a given kernel image to deal with more than one revision of a board (which you _really_ want, or am I the only one to had bad experience with production and customer screwing up updates in the past ?)
Again, I'm just thinking here, no opinions yet. Well, if you want one: <opinion> Actually I always liked the idea of clever firmware, which usually knows the underlying hardware best. </opinion>
The goal of this compact format is to allow for both clever and non-clever firmwares. You can have a pre-built device-tree "blob" that you just pass around, or really build one on the fly, though in that later case, it may be worth simply implementing the OF client interface :)
I'm also hoping there will be soon an open source release of a complete Open Firmware implementation (fully in forth/fcode on top of the engine) though I really can't tell much more about it at this point, and there is the openbios project which also aims to be an OF implementation (that one using a lot of C code)
Don't underestimate embedded hardware. The MPC5554 has 286(!) selectable-priority interrupt sources... :-)
Yes, but you don't need a node for each of them, nor even a property :) You only need typically an interrupt-related property per device having an interrupt (a given property can contain values for several interrupts if a device has more than one) or per bridge for interrupt-maps (like PCI). Though if you actually _use_ all of them (like wire 200 GPIOs used as IRQs on your board or such thing :), well, it may be worth spending a few Kb's of device-tree to avoid a hard coding mess in your kernel.
Cheers, Marius

For others I wonder what the costs of this are in terms of the memory footprint (both RAM and ROM). Are there reference implementations in existence?
Oh, and to complete my answer, no there isn't per-se a reference implementation yet. What exist so far, outside of actual full fledged OF implementations, are IBM PIBS firmware for embedded which implements the full OF client interface, and the kexec tools using the flattened format. The reason why I'm writing this document is precisely to get that developement started as part of uboot. As it was said earlier, no new board support code will be accepted upstream if it doesn't use a device-tree. This decision has been taken a while ago and will not be changed.
There are IBM internal stuffs used for bringup that implement this, so I can confirm it works :) But unfortunately, none of these can be distributed at the moment, and thus they don't constitute a reference implementation.
Ben.

On Wed, 2005-05-18 at 17:09 +1000, Benjamin Herrenschmidt wrote:
Hi !
Here's the very first draft of my HOWTO about booting the linux/ppc64 kernel without open firmware. It's still incomplete, the main chapter describing which nodes & properties are required and their format is still missing (though it will basically be a subset of the Open Firmware specification & bindings). The format of the flattened device-tree is documented.
And here is a second draft with more infos.
Booting the Linux/ppc64 kernel without Open Firmware ----------------------------------------------------
(c) 2005 Benjamin Herrenschmidt benh@kernel.crashing.org, IBM Corp.
May 18, 2005: Rev 0.1 - Initial draft, no chapter III yet.
May 19, 2005: Rev 0.2 - Add chapter III and bits & pieces here or clarifies the fact that a lot of things are optional, the kernel only requires a very small device tree, though it is encouraged to provide an as complete one as possible.
I- Introduction ===============
During the recent developpements of the Linux/ppc64 kernel, and more specifically, the addition of new platform types outside of the old IBM pSeries/iSeries pair, it was decided to enforce some strict rules regarding the kernel entry and bootloader <-> kernel interfaces, in order to avoid the degeneration that has become the ppc32 kernel entry point and the way a new platform should be added to the kernel. The legacy iSeries platform breaks those rules as it predates this scheme, but no new board support will be accepted in the main tree that doesn't follows them properly.
The main requirement that will be defined in mmore details below is the presence of a device-tree whose format is defined after Open Firmware specification. However, in order to make life easier to embedded board vendors, the kernel doesn't require the device-tree to represent every device in the system and only requires some nodes and properties to be present. This will be described in details in section III, but, for example, the kernel does not require you to create a node for every PCI device in the system. It is a requirement to have a node for PCI host bridges in order to provide interrupt routing informations and memory/IO ranges, among others. It is also recommended to define nodes for on chip devices and other busses that doesn't specifically fit in an existing OF specification, like on chip devices, this creates a great flexibility in the way the kernel can them probe those and match drivers to device, without having to hard code all sorts of tables. It also makes it more flexible for board vendors to do minor hardware upgrades without impacting significantly the kernel code or cluttering it with special cases.
1) Entry point --------------
There is one and one single entry point to the kernel, at the start of the kernel image. That entry point support two calling conventions:
a) Boot from Open Firmware. If your firmware is compatible with Open Firmware (IEEE 1275) or provides an OF compatible client interface API (support for "interpret" callback of forth words isn't required), you can enter the kernel with:
r5 : OF callback pointer as defined by IEEE 1275 bindings to powerpc. Only the 32 bits client interface is currently supported
r3, r4 : address & lenght of an initrd if any or 0
MMU is either on or off, the kernel will run the trampoline located in arch/ppc64/kernel/prom_init.c to extract the device-tree and other informations from open firmware and build a flattened device-tree as described in b). prom_init() will then re-enter the kernel using the second method. This trampoline code runs in the context of the firmware, which is supposed to handle all exceptions during that time.
b) Direct entry with a flattened device-tree block. This entry point is called by a) after the OF trampoline and can also be called directly by a bootloader that does not support the Open Firmware client interface. It is also used by "kexec" to implement "hot" booting of a new kernel from a previous running one. This method is what I will describe in more details in this document, as method a) is simply standard Open Firmware, and thus should be implemented according to the various standard documents defining it and it's binding to the PowerPC platform. The entry point definition then becomes:
r3 : physical pointer to the device-tree block (defined in chapter II)
r4 : physical pointer to the kernel itself. This is used by the assembly code to properly disable the MMU in case you are entering the kernel with MMU enabled and a non-1:1 mapping.
r5 : NULL (as to differenciate with method a)
Note about SMP entry: Either your firmware puts your other CPUs in some sleep loop or spin loop in ROM where you can get them out via a soft reset or some other mean, in which case you don't need to care, or you'll have to enter the kernel with all CPUs. The way to do that with method b) will be described in a later revision of this document.
2) Board support ----------------
Board supports (platforms) are not exclusive config options. An arbitrary set of board supports can be built in a single kernel image. The kernel will "known" what set of functions to use for a given platform based on the content of the device-tree. Thus, you should:
a) add your platform support as a _boolean_ option in arch/ppc64/Kconfig, following the example of PPC_PSERIES, PPC_PMAC and PPC_MAPLE. The later is probably a good example of a board support to start from.
b) create your main platform file as "arch/ppc64/kernel/myboard_setup.c" and add it to the Makefile under the condition of your CONFIG_ option. This file will define a structure of type "ppc_md" containing the various callbacks that the generic code will use to get to your platform specific code
c) Add a reference to your "ppc_md" structure in the "machines" table in arch/ppc64/kernel/setup.c
d) request and get assigned a platform number (see PLATFORM_* constants in include/asm-ppc64/processor.h
I will describe later the boot process and various callbacks that your platform should implement.
II - The DT block format ===========================
This chapter defines the actual format of the flattened device-tree passed to the kernel. The actual content of it and kernel requirements are described later. You can find example of code manipulating that format in various places, including arch/ppc64/kernel/prom_init.c which will generate a flattened device-tree from the Open Firmware representation, or the fs2dt utility which is part of the kexec tools which will generate one from a filesystem representation. It is expected that a bootloader like uboot provides a bit more support, that will be discussed later as well.
1) Header ---------
The kernel is entered with r3 pointing to an area of memory that is roughtly described in include/asm-ppc64/prom.h by the structure boot_param_header:
struct boot_param_header { u32 magic; /* magic word OF_DT_HEADER */ u32 totalsize; /* total size of DT block */ u32 off_dt_struct; /* offset to structure */ u32 off_dt_strings; /* offset to strings */ u32 off_mem_rsvmap; /* offset to memory reserve map */ u32 version; /* format version */ u32 last_comp_version; /* last compatible version */ /* version 2 fields below */ u32 boot_cpuid_phys; /* Which physical CPU id we're booting on */ };
Along with the constants:
/* Definitions used by the flattened device tree */ #define OF_DT_HEADER 0xd00dfeed /* 4: version, 4: total size */ #define OF_DT_BEGIN_NODE 0x1 /* Start node: full name */ #define OF_DT_END_NODE 0x2 /* End node */ #define OF_DT_PROP 0x3 /* Property: name off, size, content */ #define OF_DT_END 0x9
All values in this header are in big endian format, the various fields in this header are defined more precisely below. All "offsets" values are in bytes from the start of the header, that is from r3 value.
- magic
This is a magic value that "marks" the beginning of the device-tree block header. It contains the value 0xd00dfeed and is defined by the constant OF_DT_HEADER
- totalsize
This is the total size of the DT block including the header. The "DT" block should enclose all data structures defined in this chapter (who are pointed to by offsets in this header). That is, the device-tree structure, strings, and the memory reserve map.
- off_dt_struct
This is an offset from the beginning of the header to the start of the "structure" part the device tree. (see 2) device tree)
- off_dt_strings
This is an offset from the beginning of the header to the start of the "strings" part of the device-tree
- off_mem_rsvmap
This is an offset from the beginning of the header to the start of the reserved memory map. This map is a list of pairs of 64 bits integers. Each pair is a physical address and a size. The list is terminated by an entry of size 0. This map provides the kernel with a list of physical memory areas that are "reserved" and thus not to be used for memory allocations, especially during early initialisation. The kernel needs to allocate memory during boot for things like un-flattening the device-tree, allocating an MMU hash table, etc... Those allocations must be done in such a way to avoid overriding critical things like, on Open Firmware capable machines, the RTAS instance, or on some pSeries, the TCE tables used for the iommu. Typically, the reserve map should contain _at least_ this DT block itself (header,total_size). If you are passing an initrd to the kernel, you should reserve it as well. You do not need to reserve the kernel image itself. The map should be 64 bits aligned.
- version
This is the version of this structure. Version 1 stops here. Version 2 adds an additional field boot_cpuid_phys. You should always generate a structure of the highest version defined at the time of your implementation. That is version 2.
- last_comp_version
Last compatible version. This indicates down to what version of the DT block you are backward compatible with. For example, version 2 is backward compatible with version 1 (that is, a kernel build for version 1 will be able to boot with a version 2 format). You should put a 1 in this field unless a new incompatible version of the DT block is defined.
- boot_cpuid_phys
This field only exist on version 2 headers. It indicate which physical CPU ID is calling the kernel entry point. This is used, among others, by kexec. If you are on an SMP system, this value should match the content of the "reg" property of the CPU node in the device-tree corresponding to the CPU calling the kernel entry point (see further chapters for more informations on the required device-tree contents)
So the typical layout of a DT block (though the various parts don't need to be in that order) looks like (addresses go from top to bottom):
------------------------------ r3 -> | struct boot_param_header | ------------------------------ | (alignment gap) (*) | ------------------------------ | memory reserve map | ------------------------------ | (alignment gap) | ------------------------------ | | | device-tree structure | | | ------------------------------ | (alignment gap) | ------------------------------ | | | device-tree strings | | | -----> ------------------------------ | | --- (r3 + totalsize)
(*) The alignment gaps are not necessarily present, their presence and size are dependent on the various alignment requirements of the individual data blocks.
2) Device tree generalities ---------------------------
This device-tree itself is separated in two different blocks, a structure block and a strings block. Both need to be page aligned.
First, let's quickly describe the device-tree concept before detailing the storage format. This chapter does _not_ describe the detail of the required types of nodes & properties for the kernel, this is done later in chapter III.
The device-tree layout is strongly inherited from the definition of the Open Firmware IEEE 1275 device-tree. It's basically a tree of nodes, each node having two or more named properties. A property can have a value or not.
It is a tree, so each node has one and only one parent except for the root node who has no parent.
A node has 2 names. The actual node name is contained in a property of type "name" in the node property list whose value is a zero terminated string and is mandatory. There is also a "unit name" that is used to differenciate nodes with the same name at the same level, it is usually made of the node name's, the "@" sign, and a "unit address", which definition is specific to the bus type the node sits on. The unit name doesn't exist as a property per-se but is included in the device-tree structure. It is typically used to represent "path" in the device-tree. More details about these will be provided later. The kernel ppc64 generic code does not make any formal use of the unit address though (though some board support code may do) so the only real requirement here for the unit address is to ensure uniqueness of the node unit name at a given level. Nodes with no notion of address and no possible sibling of the same name (like /memory or /cpus) may ommit the unit address in the context of this specification, or use the "@0" default unit address. The unit name is used to define a node "full path", which is the concatenation of all parent nodes unit names separated with "/".
The root node is defined as beeing named "device-tree" and has no unit address (no @ symbol followed by a unit address). When manipulating device-tree "path", the root of the tree is generally represented by a simple slash sign "/".
Every node who actually represents an actual device (that is who isn't only a virtual "container" for more nodes, like "/cpus" is) is also required to have a "device_type" property indicating the type of node
Finally, every node is required to have a "linux,phandle" property. Real open firmware implementations don't provide it as it's generated on the fly by the prom_init.c trampoline from the Open Firmware "phandle". Implementations providing a flattened device-tree directly should provide this property. This propery is a 32 bits value that uniquely identify a node. You are free to use whatever values or system of values, internal pointers, or whatever to genrate these, the only requirement is that every single node of the tree you are passing to the kernel has a unique value in this property.
This can be used in some cases for nodes to reference other nodes.
Here is an example of a simple device-tree. In this example, a "o" designates a node followed by the node unit name. Properties are presented with their name followed by their content. "content" represent an ASCII string (zero terminated) value, while <content> represent a 32 bits hexadecimal value. The various nodes in this example will be discusse in a later chapter. At this point, it is only meant to give you a idea of what a device-tree looks like
/ o device-tree |- name = "device-tree" |- model = "MyBoardName" |- compatible = "MyBoardFamilyName" |- #address-cells = <2> |- #size-cells = <2> |- linux,phandle = <0> | o cpus | | - name = "cpus" | | - linux,phandle = <1> | | - #address-cells = <1> | | - #size-cells = <0> | | | o PowerPC,970@0 | |- name = "PowerPC,970" | |- device_type = "cpu" | |- reg = <0> | |- clock-frequency = <5f5e1000> | |- linux,boot-cpu | |- linux,phandle = <2> | o memory@0 | |- name = "memory" | |- device_type = "memory" | |- reg = <00000000 00000000 00000000 20000000> | |- linux,phandle = <3> | o chosen |- name = "chosen" |- bootargs = "root=/dev/sda2" |- linux,platform = <00000600> |- linux,phandle = <4>
This tree is an example of a minimal tree. It pretty much contains the minimal set of required nodes and properties to boot a linux kernel, that is some basic model informations at the root, the CPUs, the physical memory layout, and misc informations passed through /chosen like in this example, the platform type (mandatory) and the kernel command line arguments (optional).
The /cpus/PowerPC,970@0/linux,boot-cpu property is an example of a property without a value. All other properties have a value. The signification of the #address-cells and #size-cells properties will be explained in chapter IV which defines precisely the required nodes and properties and their content.
3) Device tree "structure" block
The structure of the device tree is a linearized tree structure. The "OF_DT_BEGIN_NODE" token starts a new node, and the "OF_DT_END" ends that node definition. Child nodes are simply defined before "OF_DT_END" (that is nodes within the node). A 'token' is a 32 bits value.
Here's the basic structure of a single node:
* token OF_DT_BEGIN_NODE (that is 0x00000001) * node full path as a zero terminated string * [align gap to next 4 bytes boundary] * for each property: * token OF_DT_PROP (that is 0x00000003) * 32 bits value of property value size in bytes (or 0 of no value) * 32 bits value of offset in string block of property name * [align gap to either next 4 bytes boundary if the property value size is less or equal to 4 bytes, or to next 8 bytes boundary if the property value size is larger than 4 bytes] * property value data if any * [align gap to next 4 bytes boundary] * [child nodes if any] * token OF_DT_END (that is 0x00000002)
So the node content can be summmarised as a start token, a full path, a list of properties, a list of child node and an end token. Every child node is a full node structure itself as defined above
4) Device tree 'strings" block
In order to save space, property names, which are generally redundant, are stored separately in the "strings" block. This block is simply the whole bunch of zero terminated strings for all property names concatenated together. The device-tree property definitions in the structure block will contain offset values from the beginning of the strings block.
III - Required content of the device tree =========================================
WARNING: All "linux,*" properties defined in this document apply only to a flattened device-tree. If your platform uses a real implementation of Open Firmware or an implementation compatible with the Open Firmware client interface, those properties will be created by the trampoline code in the kernel's prom_init() file. For example, that's where you'll have to add code to detect your board model and set the platform number. However, when using the flatenned device-tree entry point, there is no prom_init() pass, and thus you have to provide those properties yourself.
1) Note about cells and address representation ----------------------------------------------
The general rule is documented in the various Open Firmware documentations. If you chose to describe a bus with the device-tree and there exist an OF bus binding, then you should follow the specification. However, the kernel does not require every single device or bus to be described by the device tree.
In general, the format of an address for a device is defined by the parent bus type, based on the #address-cells and #size-cells property. In absence of such a property, the parent's parent values are used, etc... The kernel requires the root node to have those properties defining addresses format for devices directly mapped on the processor bus.
Those 2 properties define 'cells' for representing an address and a size. A "cell" is a 32 bits number. For example, if both contain 2 like the example tree given above, then an address and a size are both composed of 2 cells, that is a 64 bits number (cells are concatenated and expected to be in big endian format). Another example is the way Apple firmware define them, that is 2 cells for an address and one cell for a size.
A device IO or MMIO areas on the bus are defined in the "reg" property. The format of this property depends on the bus the device is sitting on. Standard bus types define their "reg" properties format in the various OF bindings for those bus types, you are free to define your own "reg" format for proprietary busses or virtual busses enclosing on-chip devices, though it is recommended that the parts of the "reg" property containing addresses and sizes do respect the defined #address-cells and #size-cells when those make sense.
Later, I will define more precisely some common address formats.
For a new ppc64 board, I recommend to use either the 2/2 format or Apple's 2/1 format which is slightly more compact since sizes usually fit in a single 32 bits word.
2) Note about "compatible" properties -------------------------------------
Those properties are optional, but recommended in devices and the root node. The format of a "compatible" property is a list of concatenated zeto terminated strings. They allow a device to express it's compatibility with a family of similar devices, in some cases, allowing a single driver to match against several devices regardless of their actual names
3) Note about "name" properties -------------------------------
While earlier users of Open Firmware like OldWorld macintoshes tended to use the actual device name for the "name" property, it's nowadays considered a good practice to use a name that is closer to the device class (often equal to device_type). For example, nowadays, ethernet controllers are named "ethernet", an additional "model" property defining precisely the chip type/model, and "compatible" property defining the family in case a single driver can driver more than one of these chips. The kernel however doesn't generally put any restriction on the "name" property, it is simply considered good practice to folow the standard and it's evolutions as closely as possible.
4) Required nodes and properties --------------------------------
Note that every node should have a "name" and a "linux,phandle" property, those aren't specified explicitely below as their presence is considered as implicit. The name property is defined in the cases where it's content is defined or has a common practice.
a) The root node
The root node requires some properties to be present:
- model : this is your board name/model - #address-cells : address representation for "root" devices - #size-cells: the size representation for "root" devices
Additionally, some recommended properties are:
- name : this is generally "device-tree" - compatible : the board "family" generally finds its way here, for example, if you have 2 board models with a similar layout, that typically get driven by the same platform code in the kernel, you would use a different "model" property but put a value in "compatible". The kernel doesn't directly use that value (see /chosen/linux,platform for how the kernel choses a platform type) but it is generally useful.
It's also generally where you add additional properties specific to your board like the serial number if any, that sort of thing. it is recommended that if you add any "custom" property whose name may clash with standard defined ones, you prefix them with your vendor name and a comma.
b) The /cpus node
This node is the parent of all individual CPUs nodes. It doesn't have any specific requirements, though it's generally good practice to have at least:
#address-cells = <00000001> #size-cells = <00000000>
This defines that the "address" for a CPU is a single cell, and has no meaningful size. This is not necessary but the kernel will assume that format when reading the "reg" properties of a CPU node, see below
c) The /cpus/* nodes
So under /cpus, you are supposed to create a node for every CPU on the machine. There is no specific restriction on the name of the CPU, though It's common practice to call it PowerPC,<name>, for example, Apple uses PowerPC,G5 while IBM uses PowerPC,970FX.
Required properties:
- device_type : has to be "cpu" - reg : This is the physical cpu number, it's single 32 bits cell, this is also used as-is as the unit number for constructing the unit name in the full path, for example, with 2 CPUs, you would have the full path: /cpus/PowerPC,970FX@0 /cpus/PowerPC,970FX@1 (unit addresses do not require to have leading zero's) - d-cache-line-size : one cell, L1 data cache line size in bytes - i-cache-line-size : one cell, L1 instruction cache line size in bytes - d-cache-size : one cell, size of L1 data cache in bytes - i-cache-size : one cell, size of L1 instruction cache in bytes
Recommended properties:
- timebase-frequency : a cell indicating the frequency of the timebase in Hz. This is not directly used by the generic code, but you are welcome to copy/paste the pSeries code for setting the kernel timebase/decrementer calibration based on this value. - clock-frequency : a cell indicating the CPU core clock frequency in Hz. A new property will be defined for 64 bits value, but if your frequency is < 4Ghz, one cell is enough. Here as well as for the above, the common code doesn't use that property, but you are welcome to re-use the pSeries or Maple one. A future kernel version might provide a common function for this.
You are welcome to add any property you find relevant to your board, like some informations about mecanism used to soft-reset the CPUs for example (Apple puts the GPIO number for CPU soft reset lines in there as a "soft-reset" property as they start secondary CPUs by soft-resetting them).
d) the /memory node(s)
To define the physical memory layout of your board, you should create one or more memory node(s). You can either create a single node with all memory ranges in it's reg property, or you can create several nodes, as you wishes. The unit address (@ part) used for the full path is the address of the first range of memory defined by a given node. If you use a single memory node, this will typically be @0.
Required properties:
- name : has to be "chosen" - device_type : has to be "memory" - reg : This property contain all the physical memory ranges of your board. It's a list of addresses/sizes concatenated together, the number of cell of those beeing defined by the #address-cells and #size-cells of the root node. For example, with both of these properties beeing 2 like in the example given earlier, a 970 based machine with 6Gb of RAM could typically have a "reg" property here that looks like:
00000000 00000000 00000000 80000000 00000001 00000000 00000001 00000000
That is a range starting at 0 of 0x80000000 bytes and a range starting at 0x100000000 and of 0x100000000 bytes. You can see that there is no memory covering the IO hold between 2Gb and 4Gb. Some vendors prefer splitting those ranges into smaller segments, the kernel doesn't care.
c) The /chosen node
This node is a bit "special". Normally, that's where open firmware puts some variable environment informations, like the arguments, or phandle pointers to nodes like the main interrupt controller, or the default input/output devices.
This specification makes a few of these mandatory, but also defines some linux specific properties that would be normally constructed by the prom_init() trampoline when booting with an OF client interface, but that you have to provide yourself when using the flattened format.
Required properties:
- name has to be "chosen" - linux,platform : This is your platform number as assigned by the architecture maintainers
Recommended properties:
- bootargs : This zero terminated string is passed as the kernel command line - linux,stdout-path : This is the full path to your standard console device if any. Typically, if you have serial devices on your board, you may want to put the full path to the one set as the default console in the firmware here, for the kernel to pick it up as it's own default console. If you look at the funciton set_preferred_console() in arch/ppc64/kernel/setup.c, you'll see that the kernel tries to find out the default console and has knowledge of various types like 8250 serial ports. You may want to extend this function to add your own. - interrupt-controller : This is one cell containing a phandle value that matches the "linux,phandle" property of your main interrupt controller node. May be used for interrupt routing.
This is all that is currently required. However, it is strongly recommended that you expose PCI host bridges as documented in the PCI binding to open firmware, and your interrupt tree as documented in OF interrupt tree specification.
IV - Recommendation for a bootloader ====================================
Here are some various ideas/recommendations that have been proposed while all this has been defined and implemented.
- It should be possible to write a parser that turns an ASCII representation of a device-tree (or even XML though I find that less readable) into a device-tree block. This would allow to basically build the device-tree structure and strings "blobs" at bootloader build time, and have the bootloader just pass-them as-is to the kernel. In fact, the device-tree blob could be then separate from the bootloader itself, an be placed in a separate portion of the flash that can be "personalized" for different board types by flashing a different device-tree
- A very The bootloader may want to be able to use the device-tree itself and may want to manipulate it (to add/edit some properties, like physical memory size or kernel arguments). At this point, 2 choices can be made. Either the bootloader works directly on the flattened format, or the bootloader has it's own internal tree representation with pointers (similar to the kernel one) and re-flattens the tree when booting the kernel. The former is a bit more difficult to edit/modify, the later requires probably a bit more code to handle the tree structure. Note that the structure format has been designed so it's relatively easy to "insert" properties or nodes or delete them by just memmovin'g things around. It contains no internal offsets or pointers for this purpose.
- An example of code for iterating nodes & retreiving properties directly from the flattened tree format can be found in the kernel file arch/ppc64/kernel/prom.c, look at scan_flat_dt() function, it's usage in early_init_devtree(), and the corresponding various early_init_dt_scan_*() callbacks. That code can be re-used in a GPL bootloader, and as the author of that code, I would be happy do discuss possible free licencing to any vendor who wishes to integrate all or part of this code into a non-GPL bootloader.

On Dunnersdag 19 Mai 2005 06:56, Benjamin Herrenschmidt wrote:
d) the /memory node(s) Required properties:
- name : has to be "chosen"
s/chosen/memory/
c) The /chosen node
- linux,platform : This is your platform number as assigned by the architecture maintainers
Does this mean you want a new platform number for every board type? I would guess that it might be easier to extend the maple platform to support all boards with ppc970 and similar CPUs (except the pmac and pSeries ones), just like I would like to extend the BPA platform for all Cell based systems.
This is all that is currently required. However, it is strongly recommended that you expose PCI host bridges as documented in the PCI binding to open firmware, and your interrupt tree as documented in OF interrupt tree specification.
AFAICS, the pci device tree is currently required if you want to use an IOMMU or if you want PCI-X or PCIe style devices with extended PCI config space. I wouldn't be surprised if other functionality also depends on it.
Arnd <><

On Thu, 2005-05-19 at 09:46 +0200, Arnd Bergmann wrote:
On Dunnersdag 19 Mai 2005 06:56, Benjamin Herrenschmidt wrote:
d) the /memory node(s) Required properties:
- name : has to be "chosen"
s/chosen/memory/
Thanks, will fix.
c) The /chosen node
- linux,platform : This is your platform number as assigned by the architecture maintainers
Does this mean you want a new platform number for every board type? I would guess that it might be easier to extend the maple platform to support all boards with ppc970 and similar CPUs (except the pmac and pSeries ones), just like I would like to extend the BPA platform for all Cell based systems.
I'd rather have a different number per board family. Embedded vendors are likely to hard code all sort of things and deal with all sort of funky bits of hardware in their xxx_setup.c code among others, I'd rather have them have a separate platform. Though it is better if they could keep "similar" boards under the same platform number and use the device-tree to differenciate them.
This is all that is currently required. However, it is strongly recommended that you expose PCI host bridges as documented in the PCI binding to open firmware, and your interrupt tree as documented in OF interrupt tree specification.
AFAICS, the pci device tree is currently required if you want to use an IOMMU or if you want PCI-X or PCIe style devices with extended PCI config space. I wouldn't be surprised if other functionality also depends on it.
No, you can use the iommu without the PCI device tree. I've verified that it works on maple by disabling generation of the PCI device tree in PIBS. Extended config space should be fixed too, though it's not an issue with existing bridges yet.
Ben.

AFAICS, the pci device tree is currently required if you want to use an IOMMU
Works fine without it, on Maple at least.
or if you want PCI-X or PCIe style devices with extended PCI config space.
Dunno.
I wouldn't be surprised if other functionality also depends on it.
If you're unlucky enough to have inherited your code from the pSeries port, then yes, it probably does.
Segher

Dear Ben,
in message 1116478614.918.75.camel@gaston you wrote:
And here is a second draft with more infos.
Booting the Linux/ppc64 kernel without Open Firmware
Thanks a lot for taking the initiative to come to an agreement about the kernel boot interface.
I have some concerns about the memory foot print and increased boot time that will result from the proposed solution. There are many embedded systems where resources are tight and requirements are aven tighter. It would be probably a good idea to also ask for feedback from these folks - for example by posting your RFC on the celinux-dev mailing list.
But my biggest concern is that we should try to come up with a solution that has a wider acceptance. Especially from the U-Boot point of view it is not exactly nice that each of PowerPC, ARM and MIPS use their very own, completely incompatible way of passing in- formation from the boot loader to the kernel.
As is, your proposal will add just another incompatible way of doing the same thing (of course we will have to stay backward compatible with U-Boot to allow booting older kernels, too).
Why don't we try to come up with a solution that is acceptable to the other architectures as well?
Maybe you want to post the RFC to lkml, or at least to the linux-arm-kernel and linux-mips mailing lists?
Best regards,
Wolfgang Denk

Wolfgang Denk wrote:
Dear Ben,
in message 1116478614.918.75.camel@gaston you wrote:
And here is a second draft with more infos.
Booting the Linux/ppc64 kernel without Open Firmware
Thanks a lot for taking the initiative to come to an agreement about the kernel boot interface.
I have some concerns about the memory foot print and increased boot time that will result from the proposed solution. There are many embedded systems where resources are tight and requirements are aven tighter. It would be probably a good idea to also ask for feedback from these folks - for example by posting your RFC on the celinux-dev mailing list.
But my biggest concern is that we should try to come up with a solution that has a wider acceptance. Especially from the U-Boot point of view it is not exactly nice that each of PowerPC, ARM and MIPS use their very own, completely incompatible way of passing in- formation from the boot loader to the kernel.
As is, your proposal will add just another incompatible way of doing the same thing (of course we will have to stay backward compatible with U-Boot to allow booting older kernels, too).
Why don't we try to come up with a solution that is acceptable to the other architectures as well?
Maybe you want to post the RFC to lkml, or at least to the linux-arm-kernel and linux-mips mailing lists?
I'm really interested in having this discussion.
I'm forced to maintain my own u-boot based solution for doing this and I'd be very interested in whatever gets chosen.
IMHO the current mess is considerable, and at this point I wouldn't really care if the resulting solution is less than optimal, as long as there is one.
Best regards,
Wolfgang Denk
Regards
Pantelis

On Thu, 2005-05-19 at 15:18 +0200, Wolfgang Denk wrote:
Dear Ben,
in message 1116478614.918.75.camel@gaston you wrote:
And here is a second draft with more infos.
Booting the Linux/ppc64 kernel without Open Firmware
Thanks a lot for taking the initiative to come to an agreement about the kernel boot interface.
I have some concerns about the memory foot print and increased boot time that will result from the proposed solution.
Like everybody it seems, which is funny in a way as I expect pretty much none (or a few Kb maybe). The kernel side code for managing a device-tree may represent more, but heh, have you seen the size of a ppc64 kernel anyways ? I don't think that is very relevant. On the bootloader side, I don't expect any significant impact. The device-tree can be very small, and the code required on the bootloader side ranges from nothing for a pre-built one, to a little bit if the bootloader has to be able to change/add properties/nodes.
There are many embedded systems where resources are tight and requirements are aven tighter.
Amen. (Though heh, this is ppc64, you can't be _that_ tight :)
It would be probably a good idea to also ask for feedback from these folks - for example by posting your RFC on the celinux-dev mailing list.
I will do when I have a little bit more mature proposal.
But my biggest concern is that we should try to come up with a solution that has a wider acceptance.
No other solution will be accepted on the kernel side. At least for ppc64
Especially from the U-Boot point of view it is not exactly nice that each of PowerPC, ARM and MIPS use their very own, completely incompatible way of passing in- formation from the boot loader to the kernel.
True.
As is, your proposal will add just another incompatible way of doing the same thing (of course we will have to stay backward compatible with U-Boot to allow booting older kernels, too).
My proposal is the only supported way to boot a ppc64 kernel. There are talks about backporting support for that to ppc32 as well. Other architectures are welcome to use it too though :) The device-tree in the kernel is fully expanded into a tree structure on ppc, since it's heavily used by various pieces of code all over the place, but for other architectures that would like to use that, it's possible to limit themselves to the flattened format. The ppc64 kernel contains some code to access nodes & properties directly from the flattened format (used early during boot) which represents very little code.
Why don't we try to come up with a solution that is acceptable to the other architectures as well?
This has been discussed over and over again, that is the best way to never come up with a solution as everybody will want something different and nobody will ever agree.
The present proposal is implemented today on the ppc64 kernel already, and we have decided to not go backward on this requirement.
Maybe you want to post the RFC to lkml, or at least to the linux-arm-kernel and linux-mips mailing lists?
Best regards,
Wolfgang Denk

Dear Ben,
in message 1116541230.5153.8.camel@gaston you wrote:
I have some concerns about the memory foot print and increased boot time that will result from the proposed solution.
Like everybody it seems, which is funny in a way as I expect pretty much none (or a few Kb maybe). The kernel side code for managing a device-tree may represent more, but heh, have you seen the size of a
I am not so narrow-minded to think only about U-Boot. I try to think about the whole system, including boot loader, kernel, and any data that might need to get passed between these two.
And please believe me, there are many, many systems out there where "a few Kb" really matter.
ppc64 kernel anyways ? I don't think that is very relevant. On the
I am aware that you think so, and I try to raise your awareness of the fact that there is a huge number of small machines out there.
Please keep in mind that the same interface will be forced sooner or later on small 8xx systems with maybe just 4 MB flash and 8 or 16 MB RAM.
And when you sell 100,000 of these units per year then "a few Kb" may cost a lot of money. Or may cause that other, prorietary OS get used.
bootloader side, I don't expect any significant impact. The device-tree can be very small, and the code required on the bootloader side ranges from nothing for a pre-built one, to a little bit if the bootloader has to be able to change/add properties/nodes.
It is IMHO wrong to have only the boot loader side in mind. We should consider the whole system.
There are many embedded systems where resources are tight and requirements are aven tighter.
Amen. (Though heh, this is ppc64, you can't be _that_ tight :)
I think you are aware that there are several people out there working on a similar boot interface for the "small" PPC systems, too.
It would be probably a good idea to also ask for feedback from these folks - for example by posting your RFC on the celinux-dev mailing list.
I will do when I have a little bit more mature proposal.
Thanks in advance.
But my biggest concern is that we should try to come up with a solution that has a wider acceptance.
No other solution will be accepted on the kernel side. At least for ppc64
This is not exactly a constructive position. When each architecture comes up with it's own solution for the same problem and then claims that no other solution will be accepted we will stick with what we have now: a mess.
If this is really your position we may as well stop here.
As is, your proposal will add just another incompatible way of doing the same thing (of course we will have to stay backward compatible with U-Boot to allow booting older kernels, too).
My proposal is the only supported way to boot a ppc64 kernel. There are
Yes, of course. And using ATAGS is the only supported way to boot an ARM kernel, and so on.
If everybody claims that his way of doing things is the only accepted solution we can really save all the time we are wasting on such a discussion.
talks about backporting support for that to ppc32 as well. Other architectures are welcome to use it too though :) The device-tree in the
Ummm.. Ben, I have really high respect for you, but such a position is simply arrogant. With the same right the ARM folks can say that ATAGS is the way to go and other architectures are welcome to use it. Actually they might have older rights.
Why don't we try to come up with a solution that is acceptable to the other architectures as well?
This has been discussed over and over again, that is the best way to never come up with a solution as everybody will want something different and nobody will ever agree.
With such a position I really wonder why you ever asked?
The present proposal is implemented today on the ppc64 kernel already, and we have decided to not go backward on this requirement.
The why the heck do you call this a RFC or a proposal? To me it seems that you don't propose but dictate a solution - a solution which pretty much ignores everything but your own requirements. If everything has been decided already I can as well shut up.
But please never claim that this has been _discusssed_.
Best regards,
Wolfgang Denk

I am aware that you think so, and I try to raise your awareness of the fact that there is a huge number of small machines out there.
Please keep in mind that the same interface will be forced sooner or later on small 8xx systems with maybe just 4 MB flash and 8 or 16 MB RAM.
I will not force it, but others may find it a good idea to do so :)
It is IMHO wrong to have only the boot loader side in mind. We should consider the whole system.
I do have the kernel in mind as well. The fact is the ppc64 kernel relies on an Open Firmware device tree and we do not want at any cost to get into the mess that is ppc32. We decided to define this flattened format for that purpose, and to allow kexec functionality. I did my best to keep the format as compact as possible (maybe a little bit more could be saved by changing the way the full path are layed out, maybe we could even do a new version which gzip's the while blob, but overall, it's fairly small).
On the kernel side, as I wrote as well, the code for dealing with the device-tree isn't that big, and will get smaller as I remove the post-processing of nodes in prom.c that we still have here. And as I wrote, if other platforms want to re-use that mecanism, they may want to just use the compact/flattened format directly. The function for scanning nodes in the flattened tree is about 40 lines of C and the function for accessing a property in a flattened node is about as much.
I think you are aware that there are several people out there working on a similar boot interface for the "small" PPC systems, too.
I know, and I was at the origin of the bi_rec proposal, a few years ago. I've simply never seen anything actually happening.
No other solution will be accepted on the kernel side. At least for ppc64
This is not exactly a constructive position. When each architecture comes up with it's own solution for the same problem and then claims that no other solution will be accepted we will stick with what we have now: a mess.
If this is really your position we may as well stop here.
The ppc64 kernel relies on an open firmware style device tree. That will not change any time soon. This proposal is a way to define a subset of this device-tree along with a compact & flattened format so that one don't have to do a full Open Firmware implementation and so that mimal trees can be used.
Yes, of course. And using ATAGS is the only supported way to boot an ARM kernel, and so on.
If everybody claims that his way of doing things is the only accepted solution we can really save all the time we are wasting on such a discussion.
Maybe. I'd rather have this proposal completed and have actual comments about the _content_ of it rather than such a debate at this point. Once we have that working, we can talk about extending it.
talks about backporting support for that to ppc32 as well. Other architectures are welcome to use it too though :) The device-tree in the
Ummm.. Ben, I have really high respect for you, but such a position is simply arrogant. With the same right the ARM folks can say that ATAGS is the way to go and other architectures are welcome to use it. Actually they might have older rights.
May well be. But that out of topic. The decision has been made already.
Why don't we try to come up with a solution that is acceptable to the other architectures as well?
This has been discussed over and over again, that is the best way to never come up with a solution as everybody will want something different and nobody will ever agree.
With such a position I really wonder why you ever asked?
I'm asking for comments about the content of the proposal and posting to inform people of what's going on. You are the one wanting to extend it to other architectures :)
The present proposal is implemented today on the ppc64 kernel already, and we have decided to not go backward on this requirement.
The why the heck do you call this a RFC or a proposal? To me it seems that you don't propose but dictate a solution - a solution which pretty much ignores everything but your own requirements. If everything has been decided already I can as well shut up.
I'm asking for comments about the actual details of it, if something was overlooked in the format (though that actually works today), if my wording is wrong in parts, if we should define in more details some aspect of it.
But please never claim that this has been _discusssed_.
No, what I meant earlier is that trying to come up with something like that, as you stated earlier, has been discussed again and again and again without any useful result.

On Fri, 2005-05-20 at 01:14 +0200, Wolfgang Denk wrote:
ppc64 kernel anyways ? I don't think that is very relevant. On the
I am aware that you think so, and I try to raise your awareness of the fact that there is a huge number of small machines out there.
Please keep in mind that the same interface will be forced sooner or later on small 8xx systems with maybe just 4 MB flash and 8 or 16 MB RAM.
I don't seem to be getting the point: As you proved conclusively on your website, 2.6 (and IMHO very likely anything that will come after it) does not scale down well to small systems like the 8xx any more anyways.
And I don't think such a major change will be "forced" upon the mostly frozen 2.4 tree.
So why try to stop the folks that want to unite the current "mess" in a proven superset datastructure that seems to suit quite fine with all chips that came into production for (at least) the last five years?

On May 19, 2005, at 8:18 AM, Wolfgang Denk wrote:
But my biggest concern is that we should try to come up with a solution that has a wider acceptance. Especially from the U-Boot point of view it is not exactly nice that each of PowerPC, ARM and MIPS use their very own, completely incompatible way of passing in- formation from the boot loader to the kernel.
As is, your proposal will add just another incompatible way of doing the same thing (of course we will have to stay backward compatible with U-Boot to allow booting older kernels, too).
Why don't we try to come up with a solution that is acceptable to the other architectures as well?
Maybe you want to post the RFC to lkml, or at least to the linux-arm-kernel and linux-mips mailing lists?
As you observe, having multiple incompatible communication mechanisms is an issue of u-boot code maintenance. Since you are the most affected party, perhaps you could propose something for all the architectures? You're obviously much more in tune with the needs of ARM and MIPS...
In the meantime, it sounds like this device tree stuff solves ppc64's problem in a way the maintainers are happy with, so it's hard to ask them to come up with a solution to a problem they don't have.
-Hollis

In message 6fcc07be88e5091ac1428e9bbde6d92f@penguinppc.org you wrote:
Maybe you want to post the RFC to lkml, or at least to the linux-arm-kernel and linux-mips mailing lists?
As you observe, having multiple incompatible communication mechanisms is an issue of u-boot code maintenance. Since you are the most affected
No, it's vice versa. U-Boot has always been just implementing what the kernel does. There are many other boot loaders around that all have to adhere to the interface(s) imposed on them by the kernel.
In the meantime, it sounds like this device tree stuff solves ppc64's problem in a way the maintainers are happy with, so it's hard to ask them to come up with a solution to a problem they don't have.
Well, actually nobody has problems: the ARM and MIPS folks have working solutions, too. The next architecture will implement yet another way of passing information to the kernel, implement it and state that they will not accept any other solution, and so on.
Best regards,
Wolfgang Denk

Wolfgang Denk writes:
But my biggest concern is that we should try to come up with a solution that has a wider acceptance. Especially from the U-Boot point of view it is not exactly nice that each of PowerPC, ARM and MIPS use their very own, completely incompatible way of passing in- formation from the boot loader to the kernel.
I am familiar with birecs and I have looked at the ARM atags structure, which is the same as birecs at an abstract level, i.e. a list of arbitrary blobs of data, each with a binary tag and a size.
As far as MIPS is concerned, there didn't seem to be any single consistent way of passing information from the bootloader to the kernel. They seem to be in a similar mess to ppc32 in this respect. I want to avoid that mess for ppc64 by stating now, while there is only one embedded ppc64 board that runs linux (the Maple eval board) that there is one true way to pass information into the kernel at boot time, and that is a flattened device tree.
Birecs and atags are both OK at representing a specified, limited set of items of information, such as the location and size of an initrd image or the total amount of memory in a system. They fall down when it comes to giving information about the devices in the system and their interconnections. For instance, atags has a structure for representing a frame buffer - but what if you have two video cards in your system?
Essentially, each element in the birecs/atags list is like a property in a device tree that has only one node, and the entire birecs/atags list is like a 1-node device tree. What the device tree gives you is the ability to organize those pieces of information hierarchically so that it becomes obvious when you have multiple instances of a device (e.g. a PCI host bridge), what pieces of information apply to which device instances, and which devices have to be used to get to certain other devices.
Thus, my opinion is that the device tree is technically superior to the birecs/atags approach. The device tree has also proven itself to be capable of representing the information that the kernel needs about all sorts of systems from the very small to the very large. Unless you can come up with something even better, ppc64 won't be changing. In particular we're not going to go back to anything like birecs or atags.
Also, given that a minimal flattened device tree fits in well under 1kB, any arguments about "excessive" memory usage will need to be accompanied by specific code and data sizes of a real-world example.
As is, your proposal will add just another incompatible way of doing the same thing (of course we will have to stay backward compatible with U-Boot to allow booting older kernels, too).
U-Boot currently doesn't support any ppc64 machines, does it? So how is there a backward compatibility issue?
Ben's proposal is for ppc64, at least as present. If the ppc32 embedded developers decide they want to use a device tree, that would be good, but it will proceed by
Why don't we try to come up with a solution that is acceptable to the other architectures as well?
Other architectures are welcome to move to using a device tree. The problem is going to be convincing them to spend the effort to make the change. None of the other architectures currently have a solution that is appealing.
Paul.

I wrote:
Ben's proposal is for ppc64, at least as present. If the ppc32 embedded developers decide they want to use a device tree, that would be good, but it will proceed by
... and got interrupted. I meant to write "proceed by persuasion and consensus, not fiat".
Paul.

On May 18, 2005, at 11:56 PM, Benjamin Herrenschmidt wrote:
- name has to be "chosen" - linux,platform : This is your platform number as assigned by the architecture maintainers
Given the seemingly endless embedded boards not developed by "core community" folks, wouldn't it be better for firmware to identify itself with a distributed namespace like "vendor.model" and let the kernel figure out whatever unique number that should be?
Requiring everyone to request a special number from kernel maintainers seems unnecessary. Or perhaps you're trying to enforce tighter development interaction...?
-Hollis

On Thu, 2005-05-19 at 23:26 -0500, Hollis Blanchard wrote:
On May 18, 2005, at 11:56 PM, Benjamin Herrenschmidt wrote:
- name has to be "chosen" - linux,platform : This is your platform number as assigned by the architecture maintainers
Given the seemingly endless embedded boards not developed by "core community" folks, wouldn't it be better for firmware to identify itself with a distributed namespace like "vendor.model" and let the kernel figure out whatever unique number that should be?
This is something I've been thinking about
Requiring everyone to request a special number from kernel maintainers seems unnecessary. Or perhaps you're trying to enforce tighter development interaction...?
Nope. The platform number is an existing thing, and the kernel isn't yet completely ready for getting rid of it, though I'd like to. It would be nice indeed rely only on /model and /compatible (or whatever other properties).
In fact, the kernel already iterates through ppc_md board structures and calls a probe() function to select which one to use ! However, all of them current just test the platform number :)
The reason for that is part historical. We have some code, including low level assembly code, that tests the platform number for things like LPAR interaction with an HyperVisor. We also have a bit of platform specific code that runs very early in things like the parsing of the interrupt tree or processor node that needs to differenciate between powermac and pseries due to difference in the way those lay things out.
However, it would definitely make sense to define a single platform number "PLATFORM_GENERIC" for every new board that doesn't need such low level interactions (I would expect something like a Xen port to require a new platform number for the sake of the low level assembly stuff but not every new embedded board) and fix the remaining places where we actually test it for things like detecting the northbridge type.
I'll see what can be done after I finish version 3 of the proposal which already contains a lot of changes and associated kernel patches :)
Ben.
participants (12)
-
Arnd Bergmann
-
Benjamin Herrenschmidt
-
Dan Malek
-
David Gibson
-
Hollis Blanchard
-
Linas Vepstas
-
Marius Groeger
-
Pantelis Antoniou
-
Paul Mackerras
-
Segher Boessenkool
-
Stefan Nickl
-
Wolfgang Denk