
Hi André,
On Fri, Aug 30, 2019 at 05:19:02PM +0100, Andre Przywara wrote:
OTOH, let's visualize it:
CPU address DRAM address
[w 00000000] SDRAM_BASE 0 [w aa55aa55] SDRAM_BASE + offset 0
Both writes should end up at the same physical location in dram).
The following test checks that values that are read back are the same. They should be the same in this specific case of Opi3 SBC. It doesn't check for any specific value, so even if writes are re-ordered, the value that's read back should still be the same.
Ah, right. I was actually staring at my modified code, where I try to detect half DQ problems with some weird 3GB setup on the Eachlink H6 Mini. So I explicitly check for and compare to the second value written, and expect this write to have overwritten the first 0 write.
Order of operations is:
writel(0, CONFIG_SYS_SDRAM_BASE); writel(0xaa55aa55, (ulong)CONFIG_SYS_SDRAM_BASE + offset); dsb();
From all I can see this means at this point the two writes are "on the bus" ("completed" from the CPU's point of view).
readl(CONFIG_SYS_SDRAM_BASE)
This should actually push the first write to the device, as a read following a write to the same address requires the write to take effect first. Might still end up in a write buffer in the DRAM controller, though.
readl((ulong)CONFIG_SYS_SDRAM_BASE + offset);
The CPU can only re-order either the writes or reads. But neither should matter.
Yes, true, for the existing code (which compares the two reads against each other).
Memory controller is a black box to me and that's where the issue probably lies.
Yes, the plot thickens. Maybe the DRAM controller buffers the write request(s), as it waits for the page(s) to open in the chips, for instance. Now the reads can be satisfied from this buffer, since the addresses match. No idea if it would break down the addresses into rows/cols/bank/rank already to detect the aliasing.
The only time this should fail and return different values when reading is if the hardware does some shortcut/assumption, and reads will not go fully from DRAM after the writes.
Sounds tempting, but with the caches and the MMU off (so treating everything as device memory) I don't see where this would happen. The only case I could imagine would be *in* the DRAM controller, as mentioned above.
Yes, that's what I meant.
So in this case we would need to tell the DRAM controller to complete all outstanding requests (the two writes) first. Then issue the reads. As mentioned, sounds possible, but I have no clue how to do this. A slight hack would indeed be some delay loop, to give the DRAM controller time to fiddle the writes into the DRAM matrix. Or flood it with writes, then DSB to force this buffer to drain?
That's antoher interesting idea. :) Though we don't know the size of the buffer.
Anyway, I'm failing to reproduce this even without any patches, ATM.
Mmmh, shame. Any changes in your setup you can think of? New power supply? Maybe we should indeed experiment with lower clocks? Did you run stability tests in Linux? To prove that the DRAM timing is actually solid?
I thought it might be temperature related, because now I'm using fan to cool the board. But that turned out to be not it (I pre-heated the board, and nothing).
No stability tests, but I've been compiling mesa repeatedly on the board without a single obvious issue, I also push a lot of data through the board to the USB connected SSD.
The kernel didn't panic unexpectedly yet. (serveral months)
I suspect that the DRAM timing is good.
I'm trying memterster now, and it looks good, too.
regards, o.
Cheers, Andre.
regards, o.
So for the sake of this patch, we should take it. It is needed and apparently solves one issue, and I can't see any harm in it.
Regardless of this I was wondering if in this situation (single in-order core running with MMU and caches off) this is the full story, as the CPU does not have a good reason to reorder.
Could you please experiment with this problem a little bit more?
- What if you just move this second DSB instruction after the second write and have two of them there? Does this also make the problem disappear?
I don't think two dsb()s next to each other will do anything useful. I would think the second dsb() would just be ignored, as the order is already preserved and all memory accesses have been completed: "A DSB is a memory barrier that ensures that memory accesses that occur before the DSB have completed before the completion of the DSB instruction." (ARMv8 ARM, section B2.3.5: DSB) I can't imagine this would introduce any kind of relevant delay, as a DSB is just a barrier.
- What if you just replace DSB with udelay(1) / udelay(10) / udelay(100)? Does this make the problem disappear?>
I suspect that we may be dealing with some sort of buffering in the DRAM controller itself
Indeed it seems perfectly possible that the DDR3 controller buffers and potentially reorders those two writes. The canonical solution would to force the write by telling the memory controller directly, however this is probably not only controller specific, but also undocumented.
I was wondering whether waiting for the next refresh would help (or for the period of one refresh cycle), but I guess the DRAM controller could buffer requests beyond that point.
and the only reason why DSB helps is that it introduces some delay because it's a rather slow instruction.
How much delay a DSB instruction introduces, is dependent on many things, but mostly on the number of outstanding memory requests and the time it takes them to complete them. So I am not sure we can call it a "slow instruction" per se.
So to summarise: I think Siarhei has a point in that this is actually more of a timing issue, but the DSB in there is needed and we should apply this patch:
Reviewed-by: Andre Przywara andre.przywara@arm.com
Cheers, Andre.