
Hi all,
In our bootloader based off of 2013.07 we make extensive use of the flat device tree. In profiling our bootloader in our simulator I found that the function eating up the most time is fdt_next_tag. Looking at it, especially fdt_offset_ptr, it looks like there is a lot of room for improvement especially in the skip name section.
Some of the checks in fdt_offset_ptr also look useless, such as if ((offset + len) < offset) which will always be false, or if (p + len < p)
len is always positive.
-Aaron

Hi Aaron,
On Wed, 16 Oct 2013 23:24:04 -0700, Aaron Williams Aaron.Williams@caviumnetworks.com wrote:
Hi all,
In our bootloader based off of 2013.07 we make extensive use of the flat device tree. In profiling our bootloader in our simulator I found that the function eating up the most time is fdt_next_tag. Looking at it, especially fdt_offset_ptr, it looks like there is a lot of room for improvement especially in the skip name section.
Some of the checks in fdt_offset_ptr also look useless, such as if ((offset + len) < offset) which will always be false, or if (p + len < p)
len is always positive.
I guess these tests are to catch corruption cases where the pointer plus length wrap around the address space. Granted, this is far from perfect protection, but still, it makes some sense.
In any case, if you found places where the code can be optimized, feel free to directly post a patch!
(you can still put notes and comments below the '---' line)
-Aaron
Amicalement,

Dear Aaron Williams,
In message 525F8284.4000304@caviumnetworks.com you wrote:
Some of the checks in fdt_offset_ptr also look useless, such as if ((offset + len) < offset) which will always be false, or if (p + len < p)
What happens if the "offset" or "p" point to addresses close to the upper end of the address space, and adding "len" makes it wrap around?
Best regards,
Wolfgang Denk

Hi Wolfgang,
On 18 October 2013 07:55, Wolfgang Denk wd@denx.de wrote:
In message 525F8284.4000304@caviumnetworks.com you wrote:
Some of the checks in fdt_offset_ptr also look useless, such as if ((offset + len) < offset) which will always be false, or if (p + len < p)
What happens if the "offset" or "p" point to addresses close to the upper end of the address space, and adding "len" makes it wrap around?
I'm not sure how particular U-Boot is about this, but the C standard doesn't specify what to do in the situation of signed overflow, so it's possible that these checks could be simply optimised away. The portable way to write it (I believe) is: if (INT_MAX - len < offset). I don't know what GCC does in this situation specifically though.
Regards, Andre

Dear Andre,
In message CAPfzE3a2ne-xcjKUTK8WS78V0yxuSd50wSQvm1rSPgNUfwP4Ow@mail.gmail.com you wrote:
Some of the checks in fdt_offset_ptr also look useless, such as if ((offset + len) < offset) which will always be false, or if (p + len < p)
What happens if the "offset" or "p" point to addresses close to the upper end of the address space, and adding "len" makes it wrap around?
I'm not sure how particular U-Boot is about this, but the C standard doesn't specify what to do in the situation of signed overflow, so
These are no signed numbers, right?
it's possible that these checks could be simply optimised away. The
This is not hwat happens.
portable way to write it (I believe) is: if (INT_MAX - len < offset). I don't know what GCC does in this situation specifically though.
This has nothing to do with GCC. It's a standard C question. Inmy understanding, the expression "offset + len" (with "offset" being "int", and "len" being "unsigned int"), will give an "unsigned int"; the comparison willt hen also be done using "unsigned int" data types.
So if you want to write a "portable" expression (though I have to admit that I don't see how this would be more portable, or how the current code is less portable), that would be:
(UINT_MAX - len < offset)
At least that would give the same results - but to me the meaning would be totally unclear when I read the code - while I think I understand the current form.
Best regards,
Wolfgang Denk

Hi Wolfgang,
On Fri, Oct 18, 2013 at 4:11 PM, Wolfgang Denk wd@denx.de wrote:
Dear Andre,
In message < CAPfzE3a2ne-xcjKUTK8WS78V0yxuSd50wSQvm1rSPgNUfwP4Ow@mail.gmail.com> you wrote:
Some of the checks in fdt_offset_ptr also look useless, such as if ((offset + len) < offset) which will always be false, or if (p + len < p)
What happens if the "offset" or "p" point to addresses close to the upper end of the address space, and adding "len" makes it wrap around?
I'm not sure how particular U-Boot is about this, but the C standard doesn't specify what to do in the situation of signed overflow, so
These are no signed numbers, right?
it's possible that these checks could be simply optimised away. The
This is not hwat happens.
Actually, it is my understanding that the "if (p + len < p)" can be optimized away. This exact case is discussed in the LWN article "GCC and pointer overflows"[1].
Basically, the C standard states that pointer arithmetic should not cause overflow, thus allowing the compiler to assume that "p + len" must always be greater than "p".
The exact wording can be found in section 6.5.6 "Additive operators", statement 8 of the C99 specification:
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
Thus, this use of pointer overflow is undefined.
This has nothing to do with GCC. It's a standard C question. Inmy
understanding, the expression "offset + len" (with "offset" being "int", and "len" being "unsigned int"), will give an "unsigned int"; the comparison willt hen also be done using "unsigned int" data types.
I agree this expression ((offset + len) < offset) is safe, since it will used unsigned ints. Since "int" and "unsigned int" have the same rank, "offset" will be converted to an unsigned int before the addition. This comes from section 6.3.1.8 in the C99 specification:
Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.
[1]: http://lwn.net/Articles/278137/
Regards, Michael Pratt

Dear Michael,
In message CAPx6ZwHLn-VABzQOyAMF+T2VyQEc3MZDi1E_kdTVZG8OkJmBYA@mail.gmail.com you wrote:
it's possible that these checks could be simply optimised away. The
This is not hwat happens.
Actually, it is my understanding that the "if (p + len < p)" can be optimized away. This exact case is discussed in the LWN article "GCC and pointer overflows"[1].
No, this does not apply here. You miss a key point. We are not doing any pointer arithmetics here. We have:
int offset; unsigned int len;
and then do:
if (((offset + len) < offset) ...) ...
Basically, the C standard states that pointer arithmetic should not cause overflow, thus allowing the compiler to assume that "p + len" must always be greater than "p".
This is totally irrelevant here. There are no pointers being used here.
--001a1133769056575204e93035f1 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Can you please stop posting HTML? Thanks!
Best regards,
Wolfgang Denk

Hi Wolfgang,
On Mon, Oct 21, 2013 at 3:55 PM, Wolfgang Denk wd@denx.de wrote:
Dear Michael,
In message CAPx6ZwHLn-VABzQOyAMF+T2VyQEc3MZDi1E_kdTVZG8OkJmBYA@mail.gmail.com you wrote:
it's possible that these checks could be simply optimised away. The
This is not hwat happens.
Actually, it is my understanding that the "if (p + len < p)" can be optimized away. This exact case is discussed in the LWN article "GCC and pointer overflows"[1].
No, this does not apply here. You miss a key point. We are not doing any pointer arithmetics here. We have:
int offset; unsigned int len;
and then do:
if (((offset + len) < offset) ...)
We seem to have a misunderstanding, I did not mean to imply that the "offset + len" expression was undefined. I agree that it is fine. I was referring to this statement from Aaron's original email:
if (p + len < p)
Which is following:
const char *p; unsigned int len;
That is the statement which is utilizing pointer overflow. .
--001a1133769056575204e93035f1 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Can you please stop posting HTML? Thanks!
Sorry about that! Hopefully this message is correct.
Regards, Michael Pratt

Dear Michael,
In message CAPx6ZwGMzAzJo=PuyRCJyJ_w=7kPjtdSHJTYYoB6dpH7pd+xLA@mail.gmail.com you wrote:
We seem to have a misunderstanding, I did not mean to imply that the "offset + len" expression was undefined. I agree that it is fine. I was referring to this statement from Aaron's original email:
if (p + len < p)
Which is following:
const char *p; unsigned int len;
That is the statement which is utilizing pointer overflow.
Agreed. That should be fixed.
Best regards,
Wolfgang Denk

Hi Aaron,
On Thu, Oct 17, 2013 at 12:24 AM, Aaron Williams Aaron.Williams@caviumnetworks.com wrote:
Hi all,
In our bootloader based off of 2013.07 we make extensive use of the flat device tree. In profiling our bootloader in our simulator I found that the function eating up the most time is fdt_next_tag. Looking at it, especially fdt_offset_ptr, it looks like there is a lot of room for improvement especially in the skip name section.
Some of the checks in fdt_offset_ptr also look useless, such as if ((offset
- len) < offset) which will always be false, or
if (p + len < p)
len is always positive.
Are you using CONFIG_OF_CONTROL?
If so, as a higher-level point, we could bring in an efficient DT library, which converts the the FDT into a tree structure for faster parsing. I can point you to a starting point if you like.
Regards, Simon
-Aaron
-- Aaron Williams Software Engineer Cavium, Inc. (408) 943-7198 (510) 789-8988 (cell)
U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

On Thu, 2013-10-17 at 16:27 -0600, Simon Glass wrote:
Hi Aaron,
On Thu, Oct 17, 2013 at 12:24 AM, Aaron Williams Aaron.Williams@caviumnetworks.com wrote:
Hi all,
In our bootloader based off of 2013.07 we make extensive use of the flat device tree. In profiling our bootloader in our simulator I found that the function eating up the most time is fdt_next_tag. Looking at it, especially fdt_offset_ptr, it looks like there is a lot of room for improvement especially in the skip name section.
Some of the checks in fdt_offset_ptr also look useless, such as if ((offset
- len) < offset) which will always be false, or
if (p + len < p)
len is always positive.
Are you using CONFIG_OF_CONTROL?
If so, as a higher-level point, we could bring in an efficient DT library, which converts the the FDT into a tree structure for faster parsing. I can point you to a starting point if you like.
I've also seen it be a noticeable performance problem (in slow simulation environments) on the extensive DT fixups we do on FSL PPC (repeated calls to functions like fdt_node_offset_by_compat_reg are particularly bad).
Though, when it comes to parsing the tree, rather than modifying it, I'm not sure that the flattened data structure is all that much worse than an unflattened tree. Lookups by path would be faster, but lookups by compatible would still have to visit every node. I think the usage patterns are at least part of the problem -- repeatedly scanning the entire tree, rather than going over it once and assigning node offsets to drivers. The driver model ought to help here.
FWIW, if there is interest in unflattening the device tree, Freescale's hypervisor has code for this: http://git.freescale.com/git/cgit.cgi/ppc/sdk/hypervisor/hypervisor.git/tree... http://git.freescale.com/git/cgit.cgi/ppc/sdk/hypervisor/hypervisor.git/tree...
I'm not sure how it compares to the code Simon had in mind, but it supports merging nodes which could be useful for boards that do dynamic device tree updates.
-Scott

Hi Scott,
On Thu, Nov 14, 2013 at 4:36 PM, Scott Wood scottwood@freescale.com wrote:
On Thu, 2013-10-17 at 16:27 -0600, Simon Glass wrote:
Hi Aaron,
On Thu, Oct 17, 2013 at 12:24 AM, Aaron Williams Aaron.Williams@caviumnetworks.com wrote:
Hi all,
In our bootloader based off of 2013.07 we make extensive use of the
flat
device tree. In profiling our bootloader in our simulator I found that
the
function eating up the most time is fdt_next_tag. Looking at it,
especially
fdt_offset_ptr, it looks like there is a lot of room for improvement especially in the skip name section.
Some of the checks in fdt_offset_ptr also look useless, such as if
((offset
- len) < offset) which will always be false, or
if (p + len < p)
len is always positive.
Are you using CONFIG_OF_CONTROL?
If so, as a higher-level point, we could bring in an efficient DT library, which converts the the FDT into a tree structure for faster parsing. I can point you to a starting point if you like.
I've also seen it be a noticeable performance problem (in slow simulation environments) on the extensive DT fixups we do on FSL PPC (repeated calls to functions like fdt_node_offset_by_compat_reg are particularly bad).
Though, when it comes to parsing the tree, rather than modifying it, I'm not sure that the flattened data structure is all that much worse than an unflattened tree. Lookups by path would be faster, but lookups by compatible would still have to visit every node. I think the usage
My thought was that compatible strings would be collated somewhere so there is an efficient way to find them. However, that might violate the simplicity rule.
patterns are at least part of the problem -- repeatedly scanning the entire tree, rather than going over it once and assigning node offsets to drivers. The driver model ought to help here.
FWIW, if there is interest in unflattening the device tree, Freescale's hypervisor has code for this:
http://git.freescale.com/git/cgit.cgi/ppc/sdk/hypervisor/hypervisor.git/tree...
http://git.freescale.com/git/cgit.cgi/ppc/sdk/hypervisor/hypervisor.git/tree...
I'm not sure how it compares to the code Simon had in mind, but it supports merging nodes which could be useful for boards that do dynamic device tree updates.
Well that code looks great. I'm not sure if we have the demand as yet (as you say driver model should largely allow the tree to be scanned sequentially and only once) but I am sure it will come up one day.
Regards, Simon

Hi Simon,
Sorry for the long delay.
On 10/17/2013 03:27 PM, Simon Glass wrote:
Hi Aaron,
On Thu, Oct 17, 2013 at 12:24 AM, Aaron Williams Aaron.Williams@caviumnetworks.com wrote:
Hi all,
In our bootloader based off of 2013.07 we make extensive use of the flat device tree. In profiling our bootloader in our simulator I found that the function eating up the most time is fdt_next_tag. Looking at it, especially fdt_offset_ptr, it looks like there is a lot of room for improvement especially in the skip name section.
Some of the checks in fdt_offset_ptr also look useless, such as if ((offset
- len) < offset) which will always be false, or
if (p + len < p)
len is always positive.
Are you using CONFIG_OF_CONTROL?
If so, as a higher-level point, we could bring in an efficient DT library, which converts the the FDT into a tree structure for faster parsing. I can point you to a starting point if you like.
Regards, Simon
A higher-level point is not desirable since when we are experiencing the performance issues we are running out of NOR flash or our simulator. Since most of our customers use NOR flash this a huge issue for us. We have very little memory available for holding data structures since basically only the stack is available before relocation.
Taking out these checks significantly sped up our boot process.
If you're checking for a wrap-around it should not check for each byte but should check only once if it will wrap and handle it accordingly. If we're wrapping then the device tree is hosed and we have bigger problems.
-Aaron

Hi Aaron,
On 13 January 2014 23:13, Aaron Williams Aaron.Williams@caviumnetworks.com wrote:
Hi Simon,
Sorry for the long delay.
On 10/17/2013 03:27 PM, Simon Glass wrote:
Hi Aaron,
On Thu, Oct 17, 2013 at 12:24 AM, Aaron Williams Aaron.Williams@caviumnetworks.com wrote:
Hi all,
In our bootloader based off of 2013.07 we make extensive use of the flat device tree. In profiling our bootloader in our simulator I found that the function eating up the most time is fdt_next_tag. Looking at it, especially fdt_offset_ptr, it looks like there is a lot of room for improvement especially in the skip name section.
Some of the checks in fdt_offset_ptr also look useless, such as if ((offset
- len) < offset) which will always be false, or
if (p + len < p)
len is always positive.
Are you using CONFIG_OF_CONTROL?
If so, as a higher-level point, we could bring in an efficient DT library, which converts the the FDT into a tree structure for faster parsing. I can point you to a starting point if you like.
Regards, Simon
A higher-level point is not desirable since when we are experiencing the performance issues we are running out of NOR flash or our simulator. Since most of our customers use NOR flash this a huge issue for us. We have very little memory available for holding data structures since basically only the stack is available before relocation.
Taking out these checks significantly sped up our boot process.
If you're checking for a wrap-around it should not check for each byte but should check only once if it will wrap and handle it accordingly. If we're wrapping then the device tree is hosed and we have bigger problems.
Are you scanning through the FDT multiple times before relocation? Certainly libfdt is designed to be careful about things and there are many checks. Are you suggesting adding some kind of CONFIG optoin tot turn them off?
I'm having a hard time understanding why these simple checks (which would expand to a few machine instructions) should take so long. Have you confirmed that removing them does significantly speed up the hardware, and it is not just an artifact of your profiling system?
It is certainly possible to pass a -ve number as the device tree offset to any of the exported functions. This should result in correct behaviour (returning an error) rather than a crash.
Of course anything that speeds up the code is welcome so long as it is still correct.
Regards, Simon
-Aaron
-- Aaron Williams Software Engineer Cavium, Inc. (408) 943-7198 (510) 789-8988 (cell)
participants (7)
-
Aaron Williams
-
Albert ARIBAUD
-
Andre Renaud
-
Michael Pratt
-
Scott Wood
-
Simon Glass
-
Wolfgang Denk