Re: [U-Boot] U-boot broken on e500v2 soc

2 Dec 2015


      On Tue, 2015-12-01 at 10:39 -0800, York Sun wrote:
...
On 11/23/2015 03:19 PM, Scott Wood wrote:
...
On Fri, 2015-11-20 at 22:33 -0800, York Sun wrote:
...
Valentin,
Can you refresh my memory why you needed this
commitac337168ad81a18e768e5e3cfff8d229adeb2b25 (patch
http://patchwork.ozlabs.org/patch/455439)?
Today I bisect an issue back to this commit.
Scott,
Can you help to examine this u-boot commit? Before this commit,
512x/5xxx/83xx/85xx do nothing on function invalidate_dcache_range() and
flush_dcache_range(). With this patch, I found e500v2 is broken on Intel
e1000
card when testing v2016-rc1. I didn't catch this issue when testing this
patch.
I wonder if this code is not safe for e500v2, or because the cache line
size
should be determined by reading L1CFG0. Why didn't we see any issue with
Linux
with the same code.
L1_CACHE_SIZE should be 5 as long as CONFIG_E500MC is not defined.  I'm
not
sure what Linux has to do with this since it isn't the same code (in
particular, Linux knows that the I/O is coherent and doesn't flush on
e500).
What happens if you comment out invalidate_cache_range() but leave
flush_cache_range()?  We should never need to do the former on e500.
If comment out the invalidate_cache_range(), this problem goes away. I can
see
several calls of this function in e1000 driver.
Shall we keep this function only for CONFIG_4xx and CONFIG_MPC86xx? That's
what
we had before.
Maybe, though it would be good to know what the actual problem is...  The
driver should not be invalidating anything that was not previously flushed.
...
...
...
=> ping $serverip
Using e1000#0 device
Bad trap at PC: 1ffc6f10, SR: 6049434, vector=e00
NIP: 1FFC6F10 XER: 00000000 LR: 1FEF0B6C REGS: 1f8eda70 TRAP: 0e00 DAR:
20000000
MSR: 06049434 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
GPR00: 1FF457D4 1F8EDB60 1F8EDF14 20000000 00020000 0000001F 00000000
1F8EDAB8
GPR08: A0003818 00004000 00000003 1F8EDB80 1FF457D4 EC662032 1FFC8D50
1FFC6F24
GPR16: 1FFB0074 1FFB005C 1FF59701 1FF5971F 1FF49C37 1FFB0068 00000000
1FFC6F10
GPR24: 1FFB00CC 1FF48D60 00000000 1F8F3D70 1FFC5600 00400000 1FF5C610
00400000
Call backtrace:
1FF2F350 1FF457D4 1FF06888 1FF13AF8 1FEFA180 1FEFA7FC 1FEF9A90
1FEFA124 1FEFA7FC 1FEFAA1C 1FF12B54 1FEFB140 1FF3AA7C 1FEFB454
1FEF0F4C
Exception in kernel pc 1ffc6f10 signal 0
### ERROR ### Please RESET the board ###
0xe00 is an instruction TLB error.  Could you dump the TLB when this
happens?
DAR of 0x20000000 looks like something that would actually cause a
problem,
but that's only relevant to data exceptions, not instruction.
What is the instruction at 0x1ffc6f10?
It is not a valid instruction. The reloacaddr is 0x1FEF0000. Doing the math,
the
original instruction would be at 0xF0016F10 which is beyond the image. I
think
this is caused by wrongly invalidated data.
Is the LR valid, or the backtrace?
-Scott