RE: [U-Boot-Users] MPC83xx data cache lock?

-----Original Message----- Just measue the time it takes to initialize ECC memory either using the cache or DMA methods; here is a short summary (don't complain - you asked for it!):
----- quote begin -----
- Read vs. write performance
Writing to DDR memory is *much* slower than reading it.
ECC off read duration: 509 ms write duration: 1546 ms
ECC on read duration: 509 ms write duration: 5703 ms
I have a test, the read vs. write performance is
ECC off read duration: 4124 ms write duration: 1516 ms
ECC on read duration: 4634 ms write duration: 5703 ms
Because data cache is locked all of ways, so the data cache's behavior looks like cache inhibited, we access memory with the two instructions, stw for 32bits write and lwz---for 32bits read.
The write performance is the same to you, but read performance is very different between us.
I don't know how did you do the read access memory?
If you only read from memory to one variable, and you don't reference this variable later, the compiler will remove the load instruction to optimize. Or you define the variable with volatile type.
I suggest you check the assembler code to make sure the load instruction in the loop and no any other memory access instructions in the loop.
When the ECC enable, the write duration is 4x difference when the ECC is off, I think sub-double word write cause read-modify-write bus operation. It will consume more time do the write access.
Why the read time is triple than the write time in my test? I will address this.
There's no clear indication in both DDR (8349) docs and Micron specification of our module on if and how read vs. write operations differ in timing. There is one pointer for the ECC case, which suggests writes can take three stages (full read-modify-write cycle) instead of just one:
"9.5.4 SDRAM Interface Timing - If ECC is disabled, writes smaller than double words are performed by appropriately activating the data mask. If ECC is enabled, the controller performs a read-modify write."
The problem is we see 3x difference when the ECC is off, and 10x when on. We also did a series of tests with various chunk sizes of data written, so as to be sure we do not do the indicated sub-double word writes, but the results were the same.
Do you make sure you do not do the sub-double word writes?
I also do one 64 bits read / write access test for full memory space.
Access memory with dobule precision float load/store instructions. Lfd for 64 bits read and stfd for 64 bits write.
The code see the attatchment. And the result is
ECC off read duration: 2317 ms write duration: 774 ms
ECC on read duration: 2317 ms write duration: 774 ms
When ECC is on, we do double word write operation, so RMW cycles don't happen.
This is really strange, although at least read operations are not affected by enabling ECC (which is according to the book - there should be minimal overhead put on read operations while ECC on, see 3. below).
- DMA (low) performance
Using DMA for transfers proves very inefficient. As mentioned earlier, the DMA module in 8349 is different than seen in other families, and it occured to us a bit "alien" when compared with the rest of the chip (DMA documentation part is rather limited, and different in style etc.), as if taken from elsewhere. It is also peculiar in technical aspects: endianness used is different, so we need to convert the order explicitly in s/w.
We tried increasing the local bus clocking but to no avail.
Local bus clock don't effect to CSB and DDR performance.
Given that low performance it doesn't make much difference whether ECC is enabled or not:
DMA, ECC on ddr init duration: 6947 ms
DMA, ECC off ddr init duration: 6721 ms
My test data is:
DMA, ECC on ddr init duration: 6945 ms
DMA, ECC off ddr init duration: 6558 ms
Just little difference to you.
There seems something broken with the DMA operations in general as they are way slower than just plain read/write to memory, which is somehow confirmed by your recent communication from the customer.
Init all of memory with DMA method as u-boot code, DMA controller will do ----read from memory and do ----write to memory. and loop it.
This will arise lot of read access from memory. Consume more time.
- ECC penalty
As can be seen in results given in 1. enabling ECC puts a huge burden on write access, which is contrary to 8349 UM:
p. 9-27 (above figure 9-24) "When ECC is enabled, one clock cycle is added to the read path to check ECC and correct single-bit errors. ECC generation does not add a cycle to the write path."
----- quote begin -----
Can you explain why writing to ECC memory is 10 times slower than reading?
I hope you can tell me how did you mesure the read time. Thanks.
Regards, Dave
participants (1)
-
Liu Dave-r63238