
On Tue, 2015-12-01 at 10:39 -0800, York Sun wrote:
On 11/23/2015 03:19 PM, Scott Wood wrote:
On Fri, 2015-11-20 at 22:33 -0800, York Sun wrote:
Valentin,
Can you refresh my memory why you needed this commitac337168ad81a18e768e5e3cfff8d229adeb2b25 (patch http://patchwork.ozlabs.org/patch/455439)? Today I bisect an issue back to this commit.
Scott,
Can you help to examine this u-boot commit? Before this commit, 512x/5xxx/83xx/85xx do nothing on function invalidate_dcache_range() and flush_dcache_range(). With this patch, I found e500v2 is broken on Intel e1000 card when testing v2016-rc1. I didn't catch this issue when testing this patch.
I wonder if this code is not safe for e500v2, or because the cache line size should be determined by reading L1CFG0. Why didn't we see any issue with Linux with the same code.
L1_CACHE_SIZE should be 5 as long as CONFIG_E500MC is not defined. I'm not sure what Linux has to do with this since it isn't the same code (in particular, Linux knows that the I/O is coherent and doesn't flush on e500).
What happens if you comment out invalidate_cache_range() but leave flush_cache_range()? We should never need to do the former on e500.
If comment out the invalidate_cache_range(), this problem goes away. I can see several calls of this function in e1000 driver.
Shall we keep this function only for CONFIG_4xx and CONFIG_MPC86xx? That's what we had before.
Maybe, though it would be good to know what the actual problem is... The driver should not be invalidating anything that was not previously flushed.
=> ping $serverip Using e1000#0 device Bad trap at PC: 1ffc6f10, SR: 6049434, vector=e00 NIP: 1FFC6F10 XER: 00000000 LR: 1FEF0B6C REGS: 1f8eda70 TRAP: 0e00 DAR: 20000000 MSR: 06049434 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
GPR00: 1FF457D4 1F8EDB60 1F8EDF14 20000000 00020000 0000001F 00000000 1F8EDAB8 GPR08: A0003818 00004000 00000003 1F8EDB80 1FF457D4 EC662032 1FFC8D50 1FFC6F24 GPR16: 1FFB0074 1FFB005C 1FF59701 1FF5971F 1FF49C37 1FFB0068 00000000 1FFC6F10 GPR24: 1FFB00CC 1FF48D60 00000000 1F8F3D70 1FFC5600 00400000 1FF5C610 00400000 Call backtrace: 1FF2F350 1FF457D4 1FF06888 1FF13AF8 1FEFA180 1FEFA7FC 1FEF9A90 1FEFA124 1FEFA7FC 1FEFAA1C 1FF12B54 1FEFB140 1FF3AA7C 1FEFB454 1FEF0F4C Exception in kernel pc 1ffc6f10 signal 0 ### ERROR ### Please RESET the board ###
0xe00 is an instruction TLB error. Could you dump the TLB when this happens?
DAR of 0x20000000 looks like something that would actually cause a problem, but that's only relevant to data exceptions, not instruction.
What is the instruction at 0x1ffc6f10?
It is not a valid instruction. The reloacaddr is 0x1FEF0000. Doing the math, the original instruction would be at 0xF0016F10 which is beyond the image. I think this is caused by wrongly invalidated data.
Is the LR valid, or the backtrace?
-Scott