Re: [U-Boot] [PATCHv2] fsl_ddr: Don't use full 64-bit divides on32-bit PowerPC

29 Jul 2011

      On Jul 29, 2011, at 14:46, York Sun wrote:
...
On Fri, 2011-07-29 at 11:35 -0700, York Sun wrote:
...
On Fri, 2011-07-29 at 12:20 -0500, Moffett, Kyle D wrote:
...
On Jul 28, 2011, at 17:41, York Sun wrote:
...
I found a problem with the round up. Please try
picos_to_mclk(mclk_to_picos(3))
You will notice the result is round up. I am sure it is unexpected.
Hmm... what does fsl_ddr_get_mem_data_rate() return on your system?
I cannot reproduce this here with my memory freq of 800MHz.
The function has obviously always done rounding, even before I made
any changes.  Have you retested what the math does with the patch
reverted to see if it makes any difference?
Hmm, looking at it, the obvious problem case would be mclk_to_picos()
using a rounded result in a subsequent multiply.  Can you retest by
replacing mclk_to_picos() with the following code; this should keep
the error down by doing the rounding-to-10ps *after* the multiply.
unsigned int mclk_to_picos(unsigned int mclk)
{
       unsigned int data_rate = get_ddr_freq(0);
       unsigned int result;
   /* Round to nearest 10ps, being careful about 64-bit multiply/divide */
   unsigned long long mclk_ps = ULL_2e12 * mclk;

   /* Add 5*data_rate, for rounding */
   mclk_ps += 5*(unsigned long long)data_rate;

   /* Now perform the big divide, the result fits in 32-bits */
   do_div(mclk_ps, data_rate);
   result = mclk_ps;

   /* We still need to round to 10ps */
   return 10 * (result/10);

}
Obviously you could also get rid of the rounding alltogether, but I
don't know why it was put in the code in the first place...
I think the problem comes from the round up. But your calculation of
remainder is not right. You explained to me with example of 2*10^^12-1.
It seems to be right until I tried another number 264*10^^7. I think the
following calculation of remainder is
clks_rem = do_div(clks, UL_5POW12);
   clks >>= 13;
   clks_rem |= (clks & (UL_2POW13-1)) * UL_5POW12;
That code you pasted is definitely not what is in the tree.
The in-tree code is:
  57  /* First multiply the time by the data rate (32x32 => 64) */
  58  clks = picos * (unsigned long long)get_ddr_freq(0);
  59
  60  /*
  61   * Now divide by 5^12 and track the 32-bit remainder, then divide
  62   * by 2*(2^12) using shifts (and updating the remainder).
  63   */
  64  clks_rem = do_div(clks, UL_5pow12);
  65  clks_rem <<= 13;
  66  clks_rem |= clks & (UL_2pow13-1);
  67  clks >>= 13;
  68
  69  /* If we had a remainder, then round up */
  70  if (clks_rem)
  71          clks++;
Can you tell me what your value of get_ddr_freq() is?  I may be able to
reproduce the issue here.
Also, can you try your test case with the replacement for mclk_to_picos()
that I pasted above and let me know what happens?
I think the *real* error is that get_memory_clk_period_ps() (which I did
not change) rounds up or down to a multiple of 10ps, and then the *result*
of get_memory_clk_period_ps() is multiplied by the number of clocks.  The
effect is that the 5ps of rounding error that can occur is multiplied by
the number of clocks, and may result in a much-too-large result.  When you
convert that number of picoseconds *back* to memory clocks you get a
result that is too big, but that's because it was too many picoseconds!
Where did you get the number 264*10^7?  That seems too big to be a number
of picoseconds (that's 2ms!), but way too small to be the product of
get_ddr_freq() and some number of picoseconds.
EG: For my system here with a DDR frequency of 800MHz, each clock is 2500ps,
so 3 clocks times 2500ps times 800MHz is 6*10^12.
In fact if you give "picos_to_mclk()" any number of picoseconds larger than
at least 1 clock, the initial value of the "clks" variable is guaranteed to
be at least 2*10^12, right?
Using my number, let's run through the math:
So if you start with picos = 7499 (to put rounding to the test) and
get_ddr_freq() is 800MHz:
  clks = 7499 * 800*1000*1000;
Now we have "clks" = 5999200000000, and we divide by 5**12:
  clks_rem = do_div(clks, UL_5pow12);
Since 5999200000000/(5**12) is 24572.7232, we get:
  clks = 24572
  clks_rem = 176562500 (which is equal to 5^12 * 0.7232)
Now left-shift clks_rem by 13:
  clks_rem = 1446400000000
Then bitwise-OR in the lower 13 bits of clks:
  clks_rem |= (24572 & 8191)
  clks_rem = 1446400008188;
Finally, rightshift clks by 13:
  clks = 2
Since there is a remainder, we "round up" from 2 to 3.
These numbers empirically make sense, because I gave the system 7499ps
(which is 3 clocks minus 1ps) and it rounded up properly.
If I give it a simple number like 7500ps, then the math is really easy
and obvious:
  clks = 7500 * 800*1000*100
Then the do_div():
  clks = 24576
  clks_rem = 0
Shifting clks_rem does nothing, because it is zero...
Oring with the lower 13 bits of clks has no effect (it has no low bits set)
Finally, when right-shifting clks by 13:
  clks = 3
This is obviously the correct result.
Cheers,
Kyle Moffett