
Hi Aaron,
On 13 January 2014 23:13, Aaron Williams Aaron.Williams@caviumnetworks.com wrote:
Hi Simon,
Sorry for the long delay.
On 10/17/2013 03:27 PM, Simon Glass wrote:
Hi Aaron,
On Thu, Oct 17, 2013 at 12:24 AM, Aaron Williams Aaron.Williams@caviumnetworks.com wrote:
Hi all,
In our bootloader based off of 2013.07 we make extensive use of the flat device tree. In profiling our bootloader in our simulator I found that the function eating up the most time is fdt_next_tag. Looking at it, especially fdt_offset_ptr, it looks like there is a lot of room for improvement especially in the skip name section.
Some of the checks in fdt_offset_ptr also look useless, such as if ((offset
- len) < offset) which will always be false, or
if (p + len < p)
len is always positive.
Are you using CONFIG_OF_CONTROL?
If so, as a higher-level point, we could bring in an efficient DT library, which converts the the FDT into a tree structure for faster parsing. I can point you to a starting point if you like.
Regards, Simon
A higher-level point is not desirable since when we are experiencing the performance issues we are running out of NOR flash or our simulator. Since most of our customers use NOR flash this a huge issue for us. We have very little memory available for holding data structures since basically only the stack is available before relocation.
Taking out these checks significantly sped up our boot process.
If you're checking for a wrap-around it should not check for each byte but should check only once if it will wrap and handle it accordingly. If we're wrapping then the device tree is hosed and we have bigger problems.
Are you scanning through the FDT multiple times before relocation? Certainly libfdt is designed to be careful about things and there are many checks. Are you suggesting adding some kind of CONFIG optoin tot turn them off?
I'm having a hard time understanding why these simple checks (which would expand to a few machine instructions) should take so long. Have you confirmed that removing them does significantly speed up the hardware, and it is not just an artifact of your profiling system?
It is certainly possible to pass a -ve number as the device tree offset to any of the exported functions. This should result in correct behaviour (returning an error) rather than a crash.
Of course anything that speeds up the code is welcome so long as it is still correct.
Regards, Simon
-Aaron
-- Aaron Williams Software Engineer Cavium, Inc. (408) 943-7198 (510) 789-8988 (cell)