Re: [U-Boot] [PATCH 14/28] drivers/fsl-mc: Changed MC firmware loading for new boot architecture

On Tue, 24 Mar 2015 21:32:56 -0500 Stuart Yoder b08248@gmail.com wrote:
Kim,
why is this being taken off-list? Adding it back. Also, please don't top-post.
I think it is premature to start focusing on saving single digit milliseconds in u-boot. The 35 second boot time you are seeing on some ls2085 boards is due to the fact that the MC was running with it's CPU caches off until a few weeks ago. DDR is still running at the slowest speed, which will affect the MC's performance. It's still early in hardware bringup and things are not even stable yet.
I'm still convinced that the MC itself probably has significant performance work.
The 1ms delay used polling for MC to boot has nothing to do with DPL size. DPL processing is a separate, later initialization step. We're just waiting for the MC to initialize here. The MC boot time shouldn't vary. My visual reading watching things boot is that the MC boot takes a few hundred milliseconds. I don't see what's wrong with a 1 ms polling delay here.
On a system with the MC running with caches on, u-boot took 20 seconds to boot. The MC took about 5 seconds of that, most of that in DPL processing.
We would welcome help analyzing u-boot boot time and where the time is going. But, seriously, saving 1 ms is not going to help at all.
I did a quick experiment and saved ~50ms when setting both udelays down to 50, which, sure, isn't a big deal given it's out of the order of 10sec, but it's something, and, like I said, development time for our users can be seriously helped if MC initialization were omitted from the main u-boot boot sequence, and occurred only when necessary, i.e., when users want to use one of the DP net interfaces. Most of the time when we boot today, we don't use DP net interfaces, so MC init - with or without DPL processing - is just a waste of our time!
Thanks,
Kim
Thanks, Stuart
On Tue, Mar 24, 2015 at 10:35 AM, Kim Phillips kim.phillips@freescale.com wrote:
On Tue, 24 Mar 2015 10:14:39 -0500 Rivera Jose-B46482 German.Rivera@freescale.com wrote:
From: Kim Phillips [mailto:kim.phillips@freescale.com] Sent: Monday, March 23, 2015 5:06 PM
On Mon, 23 Mar 2015 16:15:56 -0500 Rivera Jose-B46482 German.Rivera@freescale.com wrote:
From: Kim Phillips [mailto:kim.phillips@freescale.com] Sent: Monday, March 23, 2015 3:34 PM
On Mon, 23 Mar 2015 15:06:11 -0500 Rivera Jose-B46482 German.Rivera@freescale.com wrote:
> > From: Kim Phillips [mailto:kim.phillips@freescale.com] > > Sent: Thursday, March 19, 2015 12:53 PM > > > > On Thu, 19 Mar 2015 09:45:45 -0700 York Sun > > yorksun@freescale.com wrote: > > > > > From: "J. German Rivera" German.Rivera@freescale.com > > > > > > Changed MC firmware loading to comply with the new MC boot > > architecture. > > > Flush D-cache hierarchy after loading MC images. Add > > > environment variables "mcboottimeout" for MC boot timeout in > > > milliseconds, "mcmemsize" for MC DRAM block size. Check MC > > > boot status before calling flib functions. > > > > Can we just assign actual and/or optimal values for
'mcboottimeout'
> > and 'mcmemsize' instead of making them environment variables? > > > Having environment variables gives us the flexibility if these > values need to be changed for a given customer configuration. The > actual
what defines a 'customer configuration,' and how does that manifest itself at u-boot boot time?
A DPL (data path layout - a device-tree-like structure describing The DPAA2 objects created at boot time and their connections)
Is it the amount of time it takes to load (and execute?) firmare?
Yes, bigger DPLs take longer to process by the MC.
Why isn't customer-specific firmware being loaded via linux? All u-boot needs is basic networking, pretty static setup: fixed numbers for both memsize & timeout.
This is not customer-specific firmware. What is customer-specific is
just the DPL.
In order to have networking in u-boot, we need to load the MC firmware in u-boot, For cases in which the target system has only DPAA2-based
network interfaces.
ok, for that case, when time comes for u-boot to do some DPAA2 networking arrives (i.e., we shouldn't have to be loading firmware at board boot- time), then we should load a minimal DPL for the number of singular, non- virtual/switch, etc., interfaces for that board just to tftp: this shouldn't be a big DPL at all, and its time complexity is fixed.
It is up to the customer to decide what kind of DPL they want to have.
true, but in this case the 'customer' is the average upstream u-boot user, presumably whose DPL is the simplest ethernet use-case for tftp'ing kernels.
> boot time of the MC and the amount of memory needed by the MC is > dependent on how big/complex is the DPL used. Also, the memory > needed by the MC needs to account for how much memory is needed > for AIOP programs, which may depend on how big/complex they are.
ok, that helps (modulo not knowing what 'DPL' is), but still, the massive customer configurations should be being loaded via linux' firmware loading infrastructure: u-boot should be using a static image for u-boot's needs.
> > > +static int wait_for_mc(bool booting_mc, u32 *final_reg_gsr) { > > > + u32 reg_gsr; > > > + u32 mc_fw_boot_status; > > > + unsigned long timeout_ms = get_mc_boot_timeout_ms(); > > > + struct mc_ccsr_registers __iomem *mc_ccsr_regs = > > > +MC_CCSR_BASE_ADDR; > > > + > > > + dmb(); > > > + debug("Polling mc_ccsr_regs->reg_gsr ...\n"); > > > + assert(timeout_ms > 0); > > > + for (;;) { > > > + udelay(1000); /* throttle polling */ > > > > does this really need to be a whole 1ms? > > It is unlikely that the MC fw will boot in less than 1 ms.
is wait_for_mc() only called for the boot command, or all commands?
I see: there's a udelay(500) in mc_send_command(), which is too high, too, IMO, but I'm not that familiar with the h/w: How long does the shortest command take?
Can you answer this?
> So, checking more frequently than 1 ms is not necessary.
yes it is, because e.g., if it takes 1.1ms we will have wasted 0.9ms on this.
How significant is to save 0.9ms of the whole boot time?
Why waste 0.9ms of boot time when there's no need? It already takes the boards *way* too long to boot, and now I'm understanding mc_send_command's delay should probably be adjusted, too.
I have not heard any complain about RDB/QDS boards taking too long to boot Due to this "wasteds 0.9ms".
I'm complaining now :)
And it's not just about this singleton constant; it's about the general approach to not optimizing the code, vis-a-vis the mc_send_command() delay I bring up above.
Can you support your statement about LS2 RDB/QDS boards taking too long to boot with actual numbers? Otherwise, I will not make any change.
A board takes anywhere from 17 to 35 seconds to boot, and, from the prompts, a lot - if not most - of this time is spent during MC processing. If we take boot time down to say half of that, that would work miracles in how we spend our development time on them.
As the comment in the code says, the intent was to throttle down the polling, to reduce traffic on the system bus due to polling. This traffic competes with the MC itself accessing the system bus, as it boots. Having the polling traffic get in the way of the MC traffic may increase the MC boot time. Too small delay between polls may cause the MC boot time to increase more than the .9ms you are concerned of
wasting in the delay.
What value would you suggest to use for the delay instead 1000ms?
I don't know MC h/w: what's the shortest boot time given a standard simple "DPL"?: A small fraction of that.
I will not make any change at this time as there is no evidence that this optimization is actually needed.
this kind of attitude leaves a lot to be desired...:(
Kim _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

-----Original Message----- From: Kim Phillips [mailto:kim.phillips@freescale.com] Sent: Wednesday, March 25, 2015 4:13 PM To: Stuart Yoder Cc: Rivera Jose-B46482; Sun York-R58495; u-boot@lists.denx.de Subject: Re: [U-Boot] [PATCH 14/28] drivers/fsl-mc: Changed MC firmware loading for new boot architecture
On Tue, 24 Mar 2015 21:32:56 -0500 Stuart Yoder b08248@gmail.com wrote:
Kim,
why is this being taken off-list? Adding it back. Also, please don't top-post.
I think it is premature to start focusing on saving single digit milliseconds in u-boot. The 35 second boot time you are seeing on some ls2085 boards is due to the fact that the MC was running with it's CPU caches off until a few weeks ago. DDR is still running at the slowest speed, which will affect the MC's performance. It's still early in hardware bringup and things are not even stable yet.
I'm still convinced that the MC itself probably has significant performance work.
The 1ms delay used polling for MC to boot has nothing to do with DPL size. DPL processing is a separate, later initialization step. We're just waiting for the MC to initialize here. The MC boot time shouldn't vary. My visual reading watching things boot is that the MC boot takes a few hundred milliseconds. I don't see what's wrong with a 1 ms polling delay here.
On a system with the MC running with caches on, u-boot took 20 seconds to boot. The MC took about 5 seconds of that, most of that in DPL processing.
We would welcome help analyzing u-boot boot time and where the time is going. But, seriously, saving 1 ms is not going to help at all.
I did a quick experiment and saved ~50ms when setting both udelays down to 50, which, sure, isn't a big deal given it's out of the order of 10sec, but it's something, and, like I said, development time for our users can be seriously helped if MC initialization were omitted from the main u-boot boot sequence, and occurred only when necessary, i.e., when users want to use one of the DP net interfaces. Most of the time when we boot today, we don't use DP net interfaces, so MC init - with or without DPL processing - is just a waste of our time!
The 50ms improvement you claim will not make any difference to save time For development users, since 50ms is not something humans can perceive. As Stuart said, we (the MC team) should first analyze where is the bulk of The DPL processing by the MC fw is being spent, to see how much we can reduce the 5 seconds spent there. That is, we should be concerned first about where we can save big bucks, before being concerned about where we can save one penny or two.
Thanks,
German
Thanks,
Kim
Thanks, Stuart
On Tue, Mar 24, 2015 at 10:35 AM, Kim Phillips kim.phillips@freescale.com wrote:
On Tue, 24 Mar 2015 10:14:39 -0500 Rivera Jose-B46482 German.Rivera@freescale.com wrote:
From: Kim Phillips [mailto:kim.phillips@freescale.com] Sent: Monday, March 23, 2015 5:06 PM
On Mon, 23 Mar 2015 16:15:56 -0500 Rivera Jose-B46482 German.Rivera@freescale.com wrote:
> From: Kim Phillips [mailto:kim.phillips@freescale.com] > Sent: Monday, March 23, 2015 3:34 PM > > On Mon, 23 Mar 2015 15:06:11 -0500 Rivera Jose-B46482 > German.Rivera@freescale.com wrote: > > > > From: Kim Phillips [mailto:kim.phillips@freescale.com] > > > Sent: Thursday, March 19, 2015 12:53 PM > > > > > > On Thu, 19 Mar 2015 09:45:45 -0700 York Sun > > > yorksun@freescale.com wrote: > > > > > > > From: "J. German Rivera" German.Rivera@freescale.com > > > > > > > > Changed MC firmware loading to comply with the new MC > > > > boot > > > architecture. > > > > Flush D-cache hierarchy after loading MC images. Add > > > > environment variables "mcboottimeout" for MC boot > > > > timeout in milliseconds, "mcmemsize" for MC DRAM block > > > > size. Check MC boot status before calling flib
functions.
> > > > > > Can we just assign actual and/or optimal values for
'mcboottimeout'
> > > and 'mcmemsize' instead of making them environment
variables?
> > > > > Having environment variables gives us the flexibility if > > these values need to be changed for a given customer > > configuration. The actual > > what defines a 'customer configuration,' and how does that > manifest itself at u-boot boot time? A DPL (data path layout - a device-tree-like structure describing The DPAA2 objects created at boot time and their connections)
> Is it the amount of time it takes to load (and execute?)
firmare?
Yes, bigger DPLs take longer to process by the MC.
> Why isn't customer-specific firmware being loaded via linux? > All u-boot needs is basic networking, pretty static > setup: fixed numbers for both memsize & timeout. This is not customer-specific firmware. What is customer-specific is
just the DPL.
In order to have networking in u-boot, we need to load the MC firmware in u-boot, For cases in which the target system has only DPAA2-based
network interfaces.
ok, for that case, when time comes for u-boot to do some DPAA2 networking arrives (i.e., we shouldn't have to be loading firmware at board boot- time), then we should load a minimal DPL for the number of singular, non- virtual/switch, etc., interfaces for that board just to tftp: this shouldn't be a big DPL at all,
and its time complexity is fixed.
It is up to the customer to decide what kind of DPL they want to
have.
true, but in this case the 'customer' is the average upstream u-boot user, presumably whose DPL is the simplest ethernet use-case for tftp'ing kernels.
> > boot time of the MC and the amount of memory needed by the > > MC is dependent on how big/complex is the DPL used. Also, > > the memory needed by the MC needs to account for how much > > memory is needed for AIOP programs, which may depend on how
big/complex they are.
> > ok, that helps (modulo not knowing what 'DPL' is), but still, > the massive customer configurations should be being loaded via
linux'
> firmware loading infrastructure: u-boot should be using a > static image for u-boot's needs. > > > > > +static int wait_for_mc(bool booting_mc, u32 > > > > +*final_reg_gsr) { > > > > + u32 reg_gsr; > > > > + u32 mc_fw_boot_status; unsigned long timeout_ms = > > > > +get_mc_boot_timeout_ms(); struct mc_ccsr_registers > > > > +__iomem *mc_ccsr_regs = MC_CCSR_BASE_ADDR; > > > > + > > > > + dmb(); > > > > + debug("Polling mc_ccsr_regs->reg_gsr ...\n"); > > > > + assert(timeout_ms > 0); for (;;) { > > > > + udelay(1000); /* throttle polling */ > > > > > > does this really need to be a whole 1ms? > > > > It is unlikely that the MC fw will boot in less than 1 ms. > > is wait_for_mc() only called for the boot command, or all
commands?
I see: there's a udelay(500) in mc_send_command(), which is too high, too, IMO, but I'm not that familiar with the h/w: How long does the shortest command take?
Can you answer this?
> > So, checking more frequently than 1 ms is not necessary. > > yes it is, because e.g., if it takes 1.1ms we will have > wasted 0.9ms on this. > How significant is to save 0.9ms of the whole boot time?
Why waste 0.9ms of boot time when there's no need? It already takes the boards *way* too long to boot, and now I'm understanding mc_send_command's delay should probably be adjusted,
too.
I have not heard any complain about RDB/QDS boards taking too long to boot Due to this "wasteds 0.9ms".
I'm complaining now :)
And it's not just about this singleton constant; it's about the general approach to not optimizing the code, vis-a-vis the mc_send_command() delay I bring up above.
Can you support your statement about LS2 RDB/QDS boards taking too long to boot with actual numbers? Otherwise, I will not make any
change.
A board takes anywhere from 17 to 35 seconds to boot, and, from the prompts, a lot - if not most - of this time is spent during MC processing. If we take boot time down to say half of that, that would work miracles in how we spend our development time on them.
As the comment in the code says, the intent was to throttle down the polling, to reduce traffic on the system bus due to polling. This traffic competes with the MC itself accessing the system bus, as it boots. Having the polling traffic get in the way of the MC traffic may increase the MC boot time. Too small delay between polls may cause the MC boot time to increase more than the .9ms you are concerned of
wasting in the delay.
What value would you suggest to use for the delay instead
1000ms?
I don't know MC h/w: what's the shortest boot time given a standard simple "DPL"?: A small fraction of that.
I will not make any change at this time as there is no evidence that this optimization is actually needed.
this kind of attitude leaves a lot to be desired...:(
Kim _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot

On Thu, 26 Mar 2015 18:57:02 -0500 Rivera Jose-B46482 German.Rivera@freescale.com wrote:
From: Kim Phillips [mailto:kim.phillips@freescale.com] Sent: Wednesday, March 25, 2015 4:13 PM
On Tue, 24 Mar 2015 21:32:56 -0500 Stuart Yoder b08248@gmail.com wrote:
On a system with the MC running with caches on, u-boot took 20 seconds to boot. The MC took about 5 seconds of that, most of that in DPL processing.
We would welcome help analyzing u-boot boot time and where the time is going. But, seriously, saving 1 ms is not going to help at all.
I did a quick experiment and saved ~50ms when setting both udelays down to 50, which, sure, isn't a big deal given it's out of the order of 10sec, but it's something, and, like I said, development time for our users can be seriously helped if MC initialization were omitted from the main u-boot boot sequence, and occurred only when necessary, i.e., when users want to use one of the DP net interfaces. Most of the time when we boot today, we don't use DP net interfaces, so MC init - with or without DPL processing - is just a waste of our time!
The 50ms improvement you claim will not make any difference to save time For development users, since 50ms is not something humans can perceive. As Stuart said, we (the MC team) should first analyze where is the bulk of The DPL processing by the MC fw is being spent, to see how much we can reduce the 5 seconds spent there. That is, we should be concerned first about where we can save big bucks, before being concerned about where we can save one penny or two.
my point is that all 10 seconds of MC processing should be removed from u-boot startup time, and moved to only when u-boot needs to use MC-based ethernet, e.g., when a tftp command is invoked.
and, fwiw, the 50ms figure will improve as MC firmware performance improves.
Kim
participants (2)
-
Jose Rivera
-
Kim Phillips