Re: [U-Boot] [PATCHv2] fsl_ddr: Don't use full 64-bit divides on32-bit PowerPC

29 Jul 2011


      On Fri, 2011-07-29 at 16:26 -0500, Moffett, Kyle D wrote:
...
On Jul 29, 2011, at 14:46, York Sun wrote:
...
On Fri, 2011-07-29 at 11:35 -0700, York Sun wrote:
...
On Fri, 2011-07-29 at 12:20 -0500, Moffett, Kyle D wrote:
...
On Jul 28, 2011, at 17:41, York Sun wrote:
...
I found a problem with the round up. Please try
picos_to_mclk(mclk_to_picos(3))
You will notice the result is round up. I am sure it is unexpected.
Hmm... what does fsl_ddr_get_mem_data_rate() return on your system?
I cannot reproduce this here with my memory freq of 800MHz.
The function has obviously always done rounding, even before I made
any changes.  Have you retested what the math does with the patch
reverted to see if it makes any difference?
Hmm, looking at it, the obvious problem case would be mclk_to_picos()
using a rounded result in a subsequent multiply.  Can you retest by
replacing mclk_to_picos() with the following code; this should keep
the error down by doing the rounding-to-10ps *after* the multiply.
unsigned int mclk_to_picos(unsigned int mclk)
{
       unsigned int data_rate = get_ddr_freq(0);
       unsigned int result;
   /* Round to nearest 10ps, being careful about 64-bit multiply/divide */
   unsigned long long mclk_ps = ULL_2e12 * mclk;

   /* Add 5*data_rate, for rounding */
   mclk_ps += 5*(unsigned long long)data_rate;

   /* Now perform the big divide, the result fits in 32-bits */
   do_div(mclk_ps, data_rate);
   result = mclk_ps;

   /* We still need to round to 10ps */
   return 10 * (result/10);

}
Obviously you could also get rid of the rounding alltogether, but I
don't know why it was put in the code in the first place...
I think the problem comes from the round up. But your calculation of
remainder is not right. You explained to me with example of 2*10^^12-1.
It seems to be right until I tried another number 264*10^^7. I think the
following calculation of remainder is
clks_rem = do_div(clks, UL_5POW12);
   clks >>= 13;
   clks_rem |= (clks & (UL_2POW13-1)) * UL_5POW12;
That code you pasted is definitely not what is in the tree.
The in-tree code is:
  57  /* First multiply the time by the data rate (32x32 => 64) */
  58  clks = picos * (unsigned long long)get_ddr_freq(0);
  59
  60  /*
  61   * Now divide by 5^12 and track the 32-bit remainder, then divide
  62   * by 2*(2^12) using shifts (and updating the remainder).
  63   */
  64  clks_rem = do_div(clks, UL_5pow12);
  65  clks_rem <<= 13;
  66  clks_rem |= clks & (UL_2pow13-1);
  67  clks >>= 13;
  68
  69  /* If we had a remainder, then round up */
  70  if (clks_rem)
  71          clks++;
Can you tell me what your value of get_ddr_freq() is?  I may be able to
reproduce the issue here.
264000000
...
Also, can you try your test case with the replacement for mclk_to_picos()
that I pasted above and let me know what happens?
I think the *real* error is that get_memory_clk_period_ps() (which I did
not change) rounds up or down to a multiple of 10ps, and then the *result*
of get_memory_clk_period_ps() is multiplied by the number of clocks.  The
effect is that the 5ps of rounding error that can occur is multiplied by
the number of clocks, and may result in a much-too-large result.  When you
convert that number of picoseconds *back* to memory clocks you get a
result that is too big, but that's because it was too many picoseconds!
Correct. get_memory_clk_period_ps() is causing this problem.
...
Where did you get the number 264*10^7?  That seems too big to be a number
of picoseconds (that's 2ms!), but way too small to be the product of
get_ddr_freq() and some number of picoseconds.
Forget that. I was just trying.
...
EG: For my system here with a DDR frequency of 800MHz, each clock is 2500ps,
so 3 clocks times 2500ps times 800MHz is 6*10^12.
In fact if you give "picos_to_mclk()" any number of picoseconds larger than
at least 1 clock, the initial value of the "clks" variable is guaranteed to
be at least 2*10^12, right?
Using my number, let's run through the math:
So if you start with picos = 7499 (to put rounding to the test) and
get_ddr_freq() is 800MHz:
  clks = 7499 * 800*1000*1000;
Now we have "clks" = 5999200000000, and we divide by 5**12:
  clks_rem = do_div(clks, UL_5pow12);
Since 5999200000000/(5**12) is 24572.7232, we get:
  clks = 24572
  clks_rem = 176562500 (which is equal to 5^12 * 0.7232)
Now left-shift clks_rem by 13:
  clks_rem = 1446400000000
Here I think it is wrong. I want to use the clks_rem value later.
I change it to this
clks_rem = do_div(clks, UL_5POW12);
    clks_rem += (clks & (UL_2POW13-1)) * UL_5POW12;
    clks >>= 13;
/* If we had a remainder greater than the 1ps error, then round up */
    if (clks_rem > data_rate)
    	clks++;
After fixing the get_memory_clk_period_ps() to round up to near 1ps, the
calculation is more accurate (well, not as floating point). And ow
clks_rem holds the correct remainder.
York