[U-Boot] ELF_RELOC causes strange I-cache issues

Hello everybody,
after nailing down a few USB and FAT related bugs we had USB running stable on i.MX31, but suddenly the current mainline code behaves strangely again:
Repeating simple calls like "usb read 80800000 0 1000" will reliably hard hang the system after 3...5 calls.
The problem can be avoided by switching off the instruction cache (using the "icache off" command).
Trying to track down this problem it turns out that somehow the ELF_RELOC patches seem to be responsible for it. I have a source tree that works perfectly fine, with I-caches on, and after cherry-picking the following commits from the elf_reloc branch the problem appears:
92d5ecb 2010-10-13 10:10:21 arm: implement ELF relocations bafe743 2010-10-13 10:12:52 arm1136, qong: add support for ELF relocations
However, we cannot find a real cause in the modified code.
Here my request for help:
- Has anybody experienced similar problems?
- Did your tests of the elf_reloc code include any thorough testing of USB mass storage devices?
- If you have any suitable hardware around, could you please run a few such tests (as mentioned above, a simple "usb read <addr> 0 1000", repeated 5 times or so, should be sufficient. If you want to be sure, increase the block count and repeat more often.
All ideas welcome. Thanks a lot in advance.
Best regards,
Wolfgang Denk

Le 20/10/2010 20:49, Wolfgang Denk a écrit :
Hello everybody,
after nailing down a few USB and FAT related bugs we had USB running stable on i.MX31, but suddenly the current mainline code behaves strangely again:
Repeating simple calls like "usb read 80800000 0 1000" will reliably hard hang the system after 3...5 calls.
The problem can be avoided by switching off the instruction cache (using the "icache off" command).
Trying to track down this problem it turns out that somehow the ELF_RELOC patches seem to be responsible for it. I have a source tree that works perfectly fine, with I-caches on, and after cherry-picking the following commits from the elf_reloc branch the problem appears:
92d5ecb 2010-10-13 10:10:21 arm: implement ELF relocations bafe743 2010-10-13 10:12:52 arm1136, qong: add support for ELF relocations
However, we cannot find a real cause in the modified code.
Here my request for help:
Has anybody experienced similar problems?
Did your tests of the elf_reloc code include any thorough testing of USB mass storage devices?
If you have any suitable hardware around, could you please run a few such tests (as mentioned above, a simple "usb read<addr> 0 1000", repeated 5 times or so, should be sufficient. If you want to be sure, increase the block count and repeat more often.
All ideas welcome. Thanks a lot in advance.
Best regards,
Wolfgang Denk
Is the data cache on or off when you experience the issue? If it was on, can you try with data cache off and instruction cache on?
If the issue arises when both caches are on, then *maybe* the issue is caused by code which was loaded into i-cache *before* it was fixed up, or loaded while its fixups were still in the data cache. However this does not explain everything, since even with instruction cache off, data cache can hold fixups for some time and thus non-cached instruction fetches could return the wrong code.
Still, since ELF fixups are some sort of code self-modification, they must, according to the ARM doc, be followed by an IMB sequence. The exact sequence varies; I will look up and provide the sequence for ARM1136 tomorrow -- unless someone else can do it sooner, of course.
Amicalement,

Dear Albert ARIBAUD,
In message 4CBF4D17.6020403@free.fr you wrote:
Is the data cache on or off when you experience the issue? If it was on,
DC is always off. DC on has always casued problems with lots of drivers, including USB, so we never attempted doing that (except for verification that it indeed does cause problems).
can you try with data cache off and instruction cache on?
Done. The situation was IC off/DC off versus IC on/DC off.
If the issue arises when both caches are on, then *maybe* the issue is caused by code which was loaded into i-cache *before* it was fixed up, or loaded while its fixups were still in the data cache. However this does not explain everything, since even with instruction cache off, data cache can hold fixups for some time and thus non-cached instruction fetches could return the wrong code.
As mentioned, DC has always been off.
Still, since ELF fixups are some sort of code self-modification, they must, according to the ARM doc, be followed by an IMB sequence. The exact sequence varies; I will look up and provide the sequence for ARM1136 tomorrow -- unless someone else can do it sooner, of course.
Thanks.
Best regards,
Wolfgang Denk

Dear Albert ARIBAUD,
In message 4CBF4D17.6020403@free.fr you wrote:
Is the data cache on or off when you experience the issue? If it was
on,
DC is always off. DC on has always casued problems with lots of drivers, including USB, so we never attempted doing that (except for verification that it indeed does cause problems).
can you try with data cache off and instruction cache on?
Done. The situation was IC off/DC off versus IC on/DC off.
If the issue arises when both caches are on, then *maybe* the issue is caused by code which was loaded into i-cache *before* it was fixed up, or loaded while its fixups were still in the data cache. However this does not explain everything, since even with instruction cache off,
data
cache can hold fixups for some time and thus non-cached instruction fetches could return the wrong code.
As mentioned, DC has always been off.
Still, since ELF fixups are some sort of code self-modification, they must, according to the ARM doc, be followed by an IMB sequence. The exact sequence varies; I will look up and provide the sequence for ARM1136 tomorrow -- unless someone else can do it sooner, of course.
Thanks.
Yes, if ARM is anything like ppc you must invalidate the icache line that maps the to modified text. Otherwise you will keep using the old code.
Jocke

Wolfgang (and others who can/want),
Please test this patch; it should add a complete barrier to make sure that all fixups are written to RAM before jumping there, and that no remnants subsist of the old unfixed code in the instruction paths. However, I cannot even do basic testing on it as I have no 1136 board, so I cannot rule out even basic mistakes.
When this works I'll do a proper [PATCH].
Amicalement, Albert.
diff --git a/arch/arm/cpu/arm1136/start.S b/arch/arm/cpu/arm1136/start.S index 8b63192..f49f1de 100644 --- a/arch/arm/cpu/arm1136/start.S +++ b/arch/arm/cpu/arm1136/start.S @@ -257,6 +257,11 @@ fixloop: add r2, r2, #4 cmp r2, r3 bne fixloop + /* fixups done, cleanup caches if used and prefetch buffer */ + mov r3, #0 + mcr p15, 0, r3, c7, c10, 4 /* data synchronization barrier */ + mcr p15, 0, r3, c7, c5, 0 /* invalidate instruction cache */ + mcr p15, 0, r3, c7, c5, 4 /* flush prefetch buffer */ #endif #endif /* #ifndef CONFIG_SKIP_RELOCATE_UBOOT */

Hello Albert,
Albert Aribaud wrote:
Wolfgang (and others who can/want),
Please test this patch; it should add a complete barrier to make sure that all fixups are written to RAM before jumping there, and that no remnants subsist of the old unfixed code in the instruction paths. However, I cannot even do basic testing on it as I have no 1136 board, so I cannot rule out even basic mistakes.
When this works I'll do a proper [PATCH].
Amicalement, Albert.
diff --git a/arch/arm/cpu/arm1136/start.S b/arch/arm/cpu/arm1136/start.S index 8b63192..f49f1de 100644 --- a/arch/arm/cpu/arm1136/start.S +++ b/arch/arm/cpu/arm1136/start.S @@ -257,6 +257,11 @@ fixloop: add r2, r2, #4 cmp r2, r3 bne fixloop
- /* fixups done, cleanup caches if used and prefetch buffer */
- mov r3, #0
- mcr p15, 0, r3, c7, c10, 4 /* data synchronization barrier */
- mcr p15, 0, r3, c7, c5, 0 /* invalidate instruction cache */
- mcr p15, 0, r3, c7, c5, 4 /* flush prefetch buffer */
#endif #endif /* #ifndef CONFIG_SKIP_RELOCATE_UBOOT */
Actually I tried an identically patch, but didn;t help :-(
But as reading in the arm manual such a memory barrier should not be bad here ...
BTW:
I had a fix for this problem, but I completly not understand what it has to do with relocation (if it really is a problem introduced through relocation ...), nor why a flush_cache helps here, because dcache is off and only icache is on ...
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index f44fc4e..3e326ac 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -203,6 +203,8 @@ static inline void ehci_invalidate_dcache(struct QH *qh) static int handshake(uint32_t *ptr, uint32_t mask, uint32_t done, int usec) { uint32_t result; + + flush_cache(0, 0); do { result = ehci_readl(ptr); if (result == ~(uint32_t)0)
and the "usb read 80000000 0 1000" command works fine ...
Maybe Icache flush dosen;t work because the "ARM1136 Errata 411920 Invalidate Instruction Cache operation can fail" interferes here?
bye, Heiko

Hello,
observation here:
ICACHE is always ON. No crash with "usb read 21000000 0 1000" Sorry that I can't reproduce the problem here, not even with 10000 blocks. (tried a few dozen times) (ARM926EJS - AT91SAM9XE) (based on TOT 3ed16071b006dbda65070a4143db74da469f6e30 of 35h ago)
But with DCACHE ON, the USB Stick is not found - maybe a timing problem:
TOP9000> dc off Data (writethrough) Cache is OFF TOP9000> usb reset (Re)start USB... USB: scanning bus for devices... 2 USB Device(s) found scanning bus for storage devices... 1 Storage Device(s) found TOP9000> dc on Data (writethrough) Cache is ON TOP9000> usb reset (Re)start USB... USB: scanning bus for devices... ERROR: CTL:TIMEOUT 2 USB Device(s) found scanning bus for storage devices... 0 Storage Device(s) found TOP9000>
Reinhard

Le 21/10/2010 12:11, Reinhard Meyer a écrit :
Hello,
observation here:
ICACHE is always ON. No crash with "usb read 21000000 0 1000" Sorry that I can't reproduce the problem here, not even with 10000 blocks. (tried a few dozen times) (ARM926EJS - AT91SAM9XE) (based on TOT 3ed16071b006dbda65070a4143db74da469f6e30 of 35h ago)
But with DCACHE ON, the USB Stick is not found - maybe a timing problem:
TOP9000> dc off Data (writethrough) Cache is OFF TOP9000> usb reset (Re)start USB... USB: scanning bus for devices... 2 USB Device(s) found scanning bus for storage devices... 1 Storage Device(s) found TOP9000> dc on Data (writethrough) Cache is ON TOP9000> usb reset (Re)start USB... USB: scanning bus for devices... ERROR: CTL:TIMEOUT 2 USB Device(s) found scanning bus for storage devices... 0 Storage Device(s) found TOP9000>
Reinhard
If the USB controller uses DMA, then the DCache issue probably has to do with making sure to flush the (relevant lines of) cache before memory-to-device DMAs and to invalidate the (again, relevant lines of) cache after device-to-memory DMAs.
And I suggest we move this dcache issue to its own discussion thread.
Amicalement,

Dear Albert ARIBAUD,
observation here:
ICACHE is always ON. No crash with "usb read 21000000 0 1000" Sorry that I can't reproduce the problem here, not even with 10000 blocks. (tried a few dozen times) (ARM926EJS - AT91SAM9XE) (based on TOT 3ed16071b006dbda65070a4143db74da469f6e30 of 35h ago)
I wanted to point out that I cannot produce the ICACHE problem here.
I am willing to test patches if required to see if they break things here.
And I suggest we move this dcache issue to its own discussion thread.
however while testing I stumbled over the DCACHE problem that is probably unrelated, and that sure shall be handled separately.
Reinhard

Hello Albert,
Albert ARIBAUD wrote:
Le 21/10/2010 12:11, Reinhard Meyer a écrit :
Hello,
observation here:
ICACHE is always ON. No crash with "usb read 21000000 0 1000" Sorry that I can't reproduce the problem here, not even with 10000 blocks. (tried a few dozen times) (ARM926EJS - AT91SAM9XE) (based on TOT 3ed16071b006dbda65070a4143db74da469f6e30 of 35h ago)
But with DCACHE ON, the USB Stick is not found - maybe a timing problem:
TOP9000> dc off Data (writethrough) Cache is OFF TOP9000> usb reset (Re)start USB... USB: scanning bus for devices... 2 USB Device(s) found scanning bus for storage devices... 1 Storage Device(s) found TOP9000> dc on Data (writethrough) Cache is ON TOP9000> usb reset (Re)start USB... USB: scanning bus for devices... ERROR: CTL:TIMEOUT 2 USB Device(s) found scanning bus for storage devices... 0 Storage Device(s) found TOP9000>
Reinhard
If the USB controller uses DMA, then the DCache issue probably has to do with making sure to flush the (relevant lines of) cache before memory-to-device DMAs and to invalidate the (again, relevant lines of) cache after device-to-memory DMAs.
Yep. I think if you want to use dcache here, you have to activate CONFIG_EHCI_DCACHE and implement the flush_dcache_range(), invalidate_dcache_range(), flush_invalidate() functions for your plattform, if not implemented yet.
And I suggest we move this dcache issue to its own discussion thread.
Yep.
bye, Heiko

On Thursday 21 October 2010 12:34:01 Albert ARIBAUD wrote:
But with DCACHE ON, the USB Stick is not found - maybe a timing problem:
TOP9000> dc off Data (writethrough) Cache is OFF TOP9000> usb reset (Re)start USB... USB: scanning bus for devices... 2 USB Device(s) found
scanning bus for storage devices... 1 Storage Device(s) found
TOP9000> dc on Data (writethrough) Cache is ON TOP9000> usb reset (Re)start USB... USB: scanning bus for devices... ERROR: CTL:TIMEOUT 2 USB Device(s) found
scanning bus for storage devices... 0 Storage Device(s) found
TOP9000>
Reinhard
If the USB controller uses DMA, then the DCache issue probably has to do with making sure to flush the (relevant lines of) cache before memory-to-device DMAs and to invalidate the (again, relevant lines of) cache after device-to-memory DMAs.
Correct. Note that the EHCI driver already supports such a mode with d-cache enabled. You need to set CONFIG_EHCI_DCACHE to enable these cache handling functions.
And I suggest we move this dcache issue to its own discussion thread.
Yes. This should be analysed/handled independently.
Cheers, Stefan
-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: office@denx.de

Dear Reinhard Meyer,
In message 4CC011B4.9060808@emk-elektronik.de you wrote:
But with DCACHE ON, the USB Stick is not found - maybe a timing problem:
With D-C on, you must build with CONFIG_EHCI_DCACHE; did you do that?
Best regards,
Wolfgang Denk

Le 21/10/2010 11:51, Heiko Schocher a écrit :
Hello Albert,
Albert Aribaud wrote:
Wolfgang (and others who can/want),
Please test this patch; it should add a complete barrier to make sure that all fixups are written to RAM before jumping there, and that no remnants subsist of the old unfixed code in the instruction paths. However, I cannot even do basic testing on it as I have no 1136 board, so I cannot rule out even basic mistakes.
When this works I'll do a proper [PATCH].
Amicalement, Albert.
diff --git a/arch/arm/cpu/arm1136/start.S b/arch/arm/cpu/arm1136/start.S index 8b63192..f49f1de 100644 --- a/arch/arm/cpu/arm1136/start.S +++ b/arch/arm/cpu/arm1136/start.S @@ -257,6 +257,11 @@ fixloop: add r2, r2, #4 cmp r2, r3 bne fixloop
- /* fixups done, cleanup caches if used and prefetch buffer */
- mov r3, #0
- mcr p15, 0, r3, c7, c10, 4 /* data synchronization barrier */
- mcr p15, 0, r3, c7, c5, 0 /* invalidate instruction cache */
- mcr p15, 0, r3, c7, c5, 4 /* flush prefetch buffer */ #endif #endif /* #ifndef CONFIG_SKIP_RELOCATE_UBOOT */
Actually I tried an identically patch, but didn;t help :-(
But as reading in the arm manual such a memory barrier should not be bad here ...
BTW:
I had a fix for this problem, but I completly not understand what it has to do with relocation (if it really is a problem introduced through relocation ...), nor why a flush_cache helps here, because dcache is off and only icache is on ...
Still, this is a clue.
flush_cache(0, 0);
This amounts to calling arm1136_flush_cache.
Wolfgang/other testers, can you do the following three tests?
1. Replace the three mcr instructions I added in my patch with this single
mcr p15, 0, r1, c7, c5, 0 /* invalidate I-cache */
2. Replace the three mcr instructions I added in my patch with this single
mcr p15, 0, r1, c7, c14, 0 /* invalidate D cache */
3. Replace the three mcr instructions I added in my patch with these two
mcr p15, 0, r0, c7, c7, 0 /* Invalidate I+D+BTB caches */ mcr p15, 0, r0, c8, c7, 0 /* Invalidate Unified TLB */
Maybe Icache flush dosen;t work because the "ARM1136 Errata 411920 Invalidate Instruction Cache operation can fail" interferes here?
My ARM account seems to not allow me to get these errata from them. I've just asked for extended access, but meanwhile, is a summary of this errata freely available?
Amicalement,

Dear Albert ARIBAUD,
In message 4CC01C6B.9090904@free.fr you wrote:
Wolfgang/other testers, can you do the following three tests?
Will do asap, but first I want to share Heiko's latest findings:
With this patch all problems go away, too:
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index f44fc4e..64b8012 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -205,12 +205,12 @@ static int handshake(uint32_t *ptr, uint32_t mask, uint32_t done, int usec) uint32_t result; do { result = ehci_readl(ptr); + udelay(1); if (result == ~(uint32_t)0) return -1; result &= mask; if (result == done) return 0; - udelay(1); usec--; } while (usec > 0); return -1; diff --git a/drivers/usb/host/ehci.h b/drivers/usb/host/ehci.h index d3aa55b..945ab64 100644 --- a/drivers/usb/host/ehci.h +++ b/drivers/usb/host/ehci.h @@ -175,7 +175,7 @@ struct qTD { uint32_t qt_buffer_hi[5]; /* Appendix B */ /* pad struct for 32 byte alignment */ uint32_t unused[3]; -} __attribute__ ((aligned (32))); +};
/* Queue Head (QH). */ struct QH {
I have not the slightest idea ho to interpret this, though.
Best regards,
Wolfgang Denk

Dear Albert,
In message 20101021113605.A85D61359B7@gemini.denx.de I wrote:
With this patch all problems go away, too:
Don't count your chickens before they are hatched.
After 8 transfers of 65536 it hung again...
So not solved, but much, much better...
Best regards,
Wolfgang Denk

On 10/21/2010 01:45 PM, Wolfgang Denk wrote:
Dear Albert,
In message 20101021113605.A85D61359B7@gemini.denx.de I wrote:
With this patch all problems go away, too:
Don't count your chickens before they are hatched.
After 8 transfers of 65536 it hung again...
So not solved, but much, much better...
I have tested too, and I cannot see rather a real improvement. Sometimes it hangs after the first attempt, sometimes I need some time (but as Wolfgang reports, only a few..) to get into trouble.
I can confirm Heiko's test and the board works flawlessy flushing the cache inside the handshake() routine. Howeever, it seems this solves issue with usb, but maybe we get the same strange behavior with another driver...
Best regards, Stefano

Dear Albert ARIBAUD,
In message 4CC01C6B.9090904@free.fr you wrote:
By the way:
diff --git a/arch/arm/cpu/arm1136/start.S b/arch/arm/cpu/arm1136/start.S index 8b63192..f49f1de 100644 --- a/arch/arm/cpu/arm1136/start.S +++ b/arch/arm/cpu/arm1136/start.S @@ -257,6 +257,11 @@ fixloop: add r2, r2, #4 cmp r2, r3 bne fixloop
We have a "ble fixloop" here ?
Best regards,
Wolfgang Denk

Dear Albert ARIBAUD,
In message 4CC01C6B.9090904@free.fr you wrote:
Wolfgang/other testers, can you do the following three tests?
My test looks like this:
usb_test=usb start;run usb_test20 usb_test30 usb_test40 usb_test2=usb read 80800000 0 100;date usb_test20=run usb_test2 usb_test2 usb_test2 usb_test2 usb_test2 usb_test3=usb read 80800000 0 1000;date usb_test30=run usb_test3 usb_test3 usb_test3 usb_test3 usb_test3 usb_test4=usb read 80800000 0 10000;date usb_test40=run usb_test4 usb_test4 usb_test4 usb_test4 usb_test4
I.e. I will repeat 5 reads with 256, 4096 resp. 65536 blocks, starting with the small counts, going up.
- Replace the three mcr instructions I added in my patch with this single
Hangs at 2nd read of 4096 blocks.
- Replace the three mcr instructions I added in my patch with this single
Hangs at 2nd read of 4096 blocks.
- Replace the three mcr instructions I added in my patch with these two
Hangs at 1st read of 4096 blocks.
Best regards,
Wolfgang Denk

Le 21/10/2010 14:00, Wolfgang Denk a écrit :
Dear Albert ARIBAUD,
In message4CC01C6B.9090904@free.fr you wrote:
Wolfgang/other testers, can you do the following three tests?
My test looks like this:
usb_test=usb start;run usb_test20 usb_test30 usb_test40 usb_test2=usb read 80800000 0 100;date usb_test20=run usb_test2 usb_test2 usb_test2 usb_test2 usb_test2 usb_test3=usb read 80800000 0 1000;date usb_test30=run usb_test3 usb_test3 usb_test3 usb_test3 usb_test3 usb_test4=usb read 80800000 0 10000;date usb_test40=run usb_test4 usb_test4 usb_test4 usb_test4 usb_test4
I.e. I will repeat 5 reads with 256, 4096 resp. 65536 blocks, starting with the small counts, going up.
- Replace the three mcr instructions I added in my patch with this single
Hangs at 2nd read of 4096 blocks.
- Replace the three mcr instructions I added in my patch with this single
Hangs at 2nd read of 4096 blocks.
- Replace the three mcr instructions I added in my patch with these two
Hangs at 1st read of 4096 blocks.
Best regards,
Wolfgang Denk
Hmm... The USB code runs well for 256 blocks? This makes me question a code fixup issue, because the code executed is certainly the same regardless to the count of USB blocks (some parts get executed in a loop, but then they've been put in the cache in the first iteration, and that does not depend on the number of iterations), so IMO an i-cache issue would also be the same regardless to block count.
OTOH, block count directly affects how much memory gets written into, and in that respect there can be a difference between fixed relocation and ELF based relocation, because u-boot does not land at the same location in both cases.
On this board, where does u-boot run without ELF relocation? Where does it run with ELF relocation? What size is a single USB block?
Amicalement,

Hello Albert,
Albert Aribaud wrote:
Wolfgang (and others who can/want),
Please test this patch; it should add a complete barrier to make sure that all fixups are written to RAM before jumping there, and that no remnants subsist of the old unfixed code in the instruction paths. However, I cannot even do basic testing on it as I have no 1136 board, so I cannot rule out even basic mistakes.
When this works I'll do a proper [PATCH].
Amicalement, Albert.
diff --git a/arch/arm/cpu/arm1136/start.S
b/arch/arm/cpu/arm1136/start.S
index 8b63192..f49f1de 100644 --- a/arch/arm/cpu/arm1136/start.S +++ b/arch/arm/cpu/arm1136/start.S @@ -257,6 +257,11 @@ fixloop: add r2, r2, #4 cmp r2, r3 bne fixloop
- /* fixups done, cleanup caches if used and prefetch buffer */
- mov r3, #0
- mcr p15, 0, r3, c7, c10, 4 /* data synchronization barrier */
- mcr p15, 0, r3, c7, c5, 0 /* invalidate instruction cache */
- mcr p15, 0, r3, c7, c5, 4 /* flush prefetch buffer */
#endif #endif /* #ifndef CONFIG_SKIP_RELOCATE_UBOOT */
Actually I tried an identically patch, but didn;t help :-(
But as reading in the arm manual such a memory barrier should not be bad here ...
BTW:
I had a fix for this problem, but I completly not understand what it has to do with relocation (if it really is a problem introduced through relocation ...), nor why a flush_cache helps here, because dcache is off and only icache is on ...
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index f44fc4e..3e326ac 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -203,6 +203,8 @@ static inline void ehci_invalidate_dcache(struct QH
*qh)
static int handshake(uint32_t *ptr, uint32_t mask, uint32_t done, int
usec)
{ uint32_t result;
flush_cache(0, 0); do { result = ehci_readl(ptr); if (result == ~(uint32_t)0)
and the "usb read 80000000 0 1000" command works fine ...
Maybe Icache flush dosen;t work because the "ARM1136 Errata 411920 Invalidate Instruction Cache operation can fail" interferes here?
On ppc flush means to write the cache to ram, that is not good if the cache is invalid. You want to invalidate the cache instead.
Jocke

Dear Albert Aribaud,
In message 1287652681-4085-1-git-send-email-albert.aribaud@free.fr you wrote:
Wolfgang (and others who can/want),
Please test this patch; it should add a complete barrier to make sure that all fixups are written to RAM before jumping there, and that no remnants subsist of the old unfixed code in the instruction paths. However, I cannot even do basic testing on it as I have no 1136 board, so I cannot rule out even basic mistakes.
When this works I'll do a proper [PATCH].
I tested this, too.
It has a clearly reproducable impact, but unfortunately to the worse. Now even "usb read 80800000 0 100" will hang (i. e. reading 256 blocks); so far, this worked fine, and I needed a count of 4096 to produce the hangs.
Best regards,
Wolfgang Denk

This patch solves a problem with USB hanging under higher load on a i.MX31 board. It falls into class of typical USB problems and fixes: if you don't understand the real cause, add a delay somewhere.
The problem appeared after introduction of ELF relocation, which results in smaller code, which appears to run faster (probably because it fits better in the cache); turning off the instruction cache, adding debug printf()s and increasing the delay have all been found to make the problem go away.
Moving the original "udelay(1)" up in the code to it's new place made the problem appear much less frequently. Increasing the delay to 2 microseconds then made the code run reliably in all (hour-long) tests. To be on the safe side, we set it to 5 microseconds here.
Signed-off-by: Heiko schocher hs@denx.de Signed-off-by: Wolfgang Denk wd@denx.de Cc: Remy Bohmer linux@bohmer.net Cc: Stefano Babic sbabic@denx.de --- drivers/usb/host/ehci-hcd.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index f44fc4e..982f96e 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -205,12 +205,12 @@ static int handshake(uint32_t *ptr, uint32_t mask, uint32_t done, int usec) uint32_t result; do { result = ehci_readl(ptr); + udelay(5); if (result == ~(uint32_t)0) return -1; result &= mask; if (result == done) return 0; - udelay(1); usec--; } while (usec > 0); return -1;

Hi,
2010/10/22 Wolfgang Denk wd@denx.de:
This patch solves a problem with USB hanging under higher load on a i.MX31 board. It falls into class of typical USB problems and fixes: if you don't understand the real cause, add a delay somewhere.
The problem appeared after introduction of ELF relocation, which results in smaller code, which appears to run faster (probably because it fits better in the cache); turning off the instruction cache, adding debug printf()s and increasing the delay have all been found to make the problem go away.
Moving the original "udelay(1)" up in the code to it's new place made the problem appear much less frequently. Increasing the delay to 2 microseconds then made the code run reliably in all (hour-long) tests. To be on the safe side, we set it to 5 microseconds here.
Signed-off-by: Heiko schocher hs@denx.de Signed-off-by: Wolfgang Denk wd@denx.de Cc: Remy Bohmer linux@bohmer.net Cc: Stefano Babic sbabic@denx.de
drivers/usb/host/ehci-hcd.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)
Not nice, but I do not see how this could harm something. Applied to u-boot-usb.
Kind regards,
Remy

Hello all,
In message 20101020184930.E89F7136320@gemini.denx.de I wrote:
after nailing down a few USB and FAT related bugs we had USB running stable on i.MX31, but suddenly the current mainline code behaves strangely again:
Repeating simple calls like "usb read 80800000 0 1000" will reliably hard hang the system after 3...5 calls.
The problem can be avoided by switching off the instruction cache (using the "icache off" command).
Trying to track down this problem it turns out that somehow the ELF_RELOC patches seem to be responsible for it. I have a source tree that works perfectly fine, with I-caches on, and after cherry-picking the following commits from the elf_reloc branch the problem appears:
92d5ecb 2010-10-13 10:10:21 arm: implement ELF relocations bafe743 2010-10-13 10:12:52 arm1136, qong: add support for ELF relocations
Thanks to everybody who spent time and efforts looking into this.
I think we halve solved (wel, actually worked around) the problem; the solution is (like so often) adding / increasing a delay in the USB code.
I think the ELF relocations only triggered the problem because they resulted in smaller code which (most probably) also executes a bit faster - and the difference was enough to trigger the problem.
After increasing a delay in the USB code, I see no indications whatever that the ELF relocation cod emight be to blame for the issues we observed.
Best regards,
Wolfgang Denk

Le 22/10/2010 14:26, Wolfgang Denk a écrit :
I think we halve solved (wel, actually worked around) the problem; the solution is (like so often) adding / increasing a delay in the USB code.
Good! Out of curiosity, is this timing inevitable or is there a way to turn it into a wait loop (plus a timeout for security)?
I think the ELF relocations only triggered the problem because they resulted in smaller code which (most probably) also executes a bit faster - and the difference was enough to trigger the problem.
I concur for the execution speed, but not due to ELF relocations per se, rather due to the i-cache being turned on and working correctly, i.e. increasing code speed, to the point that a bad timing condition occurs.
As for size, ELF relocations actually do not change executable code size with respect to "fixed" relocation. GOT relocation, OTOH, would increase the code size and slow it slightly down.
We'll probably see more of these with the increased use of i-cache and d-cache on ARM.
Best regards,
Wolfgang Denk
Amicalement,

Dear Albert ARIBAUD,
In message 4CC18F4C.4090900@free.fr you wrote:
Good! Out of curiosity, is this timing inevitable or is there a way to turn it into a wait loop (plus a timeout for security)?
Well, maybe one could wait - the problem at this time is that we have not the slightest idea what we should actually wait for :-(
We'll probably see more of these with the increased use of i-cache and d-cache on ARM.
I am afraid you are right ;-)
Best regards,
Wolfgang Denk
participants (9)
-
Albert ARIBAUD
-
Albert Aribaud
-
Heiko Schocher
-
Joakim Tjernlund
-
Reinhard Meyer
-
Remy Bohmer
-
Stefan Roese
-
Stefano Babic
-
Wolfgang Denk