Re: [U-Boot-Users] cfi_flash.c and lost volatile qualifier

Adrian Filipi wrote:
On Tue, 29 Apr 2008, Jerry Van Baren wrote:
Adrian Filipi wrote:
On Tue, 29 Apr 2008, Wolfgang Denk wrote:
In message alpine.DEB.1.10.0804291604250.32753@pmy.adscville you wrote:
I narrowed down the source of the problem to the loss of the
volatile qualifier on the addr pointer in flash_write_cmd(). Adding the qualifier gets rid of the corruption. In the older 1.2.0 sources, addr was declared as "volatile cfiptr_t addr;".
The volatile should not be needed - the CFI driver should use correct accessor macros instead. See Documentation/volatile-considered-harmful.txt in your Linux kernel source tree...
Best regards,
Wolfgang Denk
Sure, reducing the reliance on volatile is a good idea, but I'm
at a loss for anything better to do.
I'm seeing a real problem that is only fixed by qualifying the
container of the pointer as volatile, i.e. "void *volatile". "volatile void *" has no effect as expected given that the read/write accessors are used now.
The old data type was essentially "volatile void *volatile addr"
and the new type is simply "void *addr". I seem to need at least "void *volatile addr" for things to work.
Note, I'm only seeing this problem on our PXA250 boards. Adrian
Hi Adrian,
Please bottom post.
It may be useful to disassemble cfi_flash.o file (objdump -S) and see what the two different configurations of flash_write_cmd() get turned into in assembly.
Best regards, gvb
I've attached the assembler output as well as the diff to the
version with volatile added. I don't see anything incriminating. Instead of using a register, the variable is read from its location on the stack.
I'm having a hard time seeing how flash could get corrupted by this
code. It's happening during initialization.
Adrian
I'm not an ARMexpert, but if that were a PowerPC CPU, I would *strongly* suspect that the bus interface unit is rearranging the bus operations going to memory (e.g. moving a read ahead of a write or re-ordering the writes). The result is that the write command/data sequence to the flash gets buggered by the re-ordering.
Per my theory, by putting in the "volatile", you force extra reads on the bus which forces the bus interface unit to flush out the flash write sequence in the right order. The fix is totally unrelated the the problem, other than as an unrecognized side effect it "fixes" the problem.
The solution in the PowerPC world is to add a "eieio" or "sync" instruction[1] appropriately[2], which prevents the bus interface unit from reordering the memory operations.
Your task is to dig into the PXA250 manual to (a) figure out what syncs you need and (b) figure out why the bus interface is doing you in. Alternatively, (c) show that I don't know what I'm talking about. Obviously, (a) gets you the best results, so lets root for that. ;-)
HTH, gvb
[1] Sync is a big hammer, eieio is a medium size hammer. Don't use both, PPC people will know you don't know what you are doing. ;-)
[2] The flash command sequence order is critical (e.g. the 55s and AAs). The actual data sequence is not critical. The observed problem is in flash_write_cmd() that is writing the command sequence. Hmmmmm.

[1] Sync is a big hammer, eieio is a medium size hammer. Don't use both, PPC people will know you don't know what you are doing. ;-)
Yet the in_bex()/out_bex() functions in PowerPC linux uses sync and all SOC drivers are encouraged to use them. What a waste :(
Jocke

On Wed, Apr 30, 2008 at 05:11:09PM +0200, Joakim Tjernlund wrote:
Yet the in_bex()/out_bex() functions in PowerPC linux uses sync and all SOC drivers are encouraged to use them. What a waste :(
sync is needed in some of the cases, to sync I/O accesses with DMA buffer accesses. Ideally, we could trust the driver writers to put synchronization in where needed, but it seems Linux has too much x86 heritage for that.
There should at least be raw alternatives, though...
-Scott

On Wed, 2008-04-30 at 10:21 -0500, Scott Wood wrote:
On Wed, Apr 30, 2008 at 05:11:09PM +0200, Joakim Tjernlund wrote:
Yet the in_bex()/out_bex() functions in PowerPC linux uses sync and all SOC drivers are encouraged to use them. What a waste :(
sync is needed in some of the cases, to sync I/O accesses with DMA buffer accesses. Ideally, we could trust the driver writers to put synchronization in where needed, but it seems Linux has too much x86 heritage for that.
Perhaps, is sync needed in this case for non-smp too? or is eieio enough? Anyway, just have a look at ucc_geth and you will see that most such accesses are just about getting the endian right.
There should at least be raw alternatives, though...
There need be a get-the-endian-right-but-no-sync. After all 2.4 managed well without using the in/out be() functions.
Jocke
-Scott

In message 1209569696.16926.53.camel@gentoo-jocke.transmode.se you wrote:
There need be a get-the-endian-right-but-no-sync. After all 2.4 managed well without using the in/out be() functions.
Yes, but try building it with a recent version of GCC and run the code on some PPC systems. There may be some nasty surprises. I remember that we had to fix a couple of drivers in our old 2.4.25 kernel.
Best regards,
Wolfgang Denk

On Wed, Apr 30, 2008 at 05:34:56PM +0200, Joakim Tjernlund wrote:
On Wed, 2008-04-30 at 10:21 -0500, Scott Wood wrote:
On Wed, Apr 30, 2008 at 05:11:09PM +0200, Joakim Tjernlund wrote:
Yet the in_bex()/out_bex() functions in PowerPC linux uses sync and all SOC drivers are encouraged to use them. What a waste :(
sync is needed in some of the cases, to sync I/O accesses with DMA buffer accesses. Ideally, we could trust the driver writers to put synchronization in where needed, but it seems Linux has too much x86 heritage for that.
Perhaps, is sync needed in this case for non-smp too? or is eieio enough?
Yes, sync is needed -- eieio doesn't order between stores to cacheable memory and stores to cache-inhibited memory.
Anyway, just have a look at ucc_geth
Do I have to? :-)
and you will see that most such accesses are just about getting the endian right.
If you mean the descriptor accesses, ordering is relevant there as well.
There should at least be raw alternatives, though...
There need be a get-the-endian-right-but-no-sync.
Agreed. There's cpu_to_be32, etc., but that doesn't fit well with load/store endian-swapping instructions.
After all 2.4 managed well without using the in/out be() functions.
I see in/out_be32() in 2.4... and the significant chunks of code that use volatile pointers instead, I wouldn't consider to be managing "well"; they just happen to work most of the time.
-Scott

On Wed, 2008-04-30 at 11:02 -0500, Scott Wood wrote:
On Wed, Apr 30, 2008 at 05:34:56PM +0200, Joakim Tjernlund wrote:
On Wed, 2008-04-30 at 10:21 -0500, Scott Wood wrote:
On Wed, Apr 30, 2008 at 05:11:09PM +0200, Joakim Tjernlund wrote:
Yet the in_bex()/out_bex() functions in PowerPC linux uses sync and all SOC drivers are encouraged to use them. What a waste :(
sync is needed in some of the cases, to sync I/O accesses with DMA buffer accesses. Ideally, we could trust the driver writers to put synchronization in where needed, but it seems Linux has too much x86 heritage for that.
Perhaps, is sync needed in this case for non-smp too? or is eieio enough?
Yes, sync is needed -- eieio doesn't order between stores to cacheable memory and stores to cache-inhibited memory.
OK, thanks
Anyway, just have a look at ucc_geth
Do I have to? :-)
Yes, it needs a little love here and there :) Especially Timur, his little "rename the tx/rx clock" trick costed me a few days :)
Jocke

Joakim Tjernlund wrote:
[1] Sync is a big hammer, eieio is a medium size hammer. Don't use both, PPC people will know you don't know what you are doing. ;-)
Yet the in_bex()/out_bex() functions in PowerPC linux uses sync and all SOC drivers are encouraged to use them. What a waste :(
Jocke
Well, I was a little terse because I was cross-applying PPC instructions in a ARM discussion. Personally, I prefer to use sync vs. eieio. The size of the hammer isn't that different, I don't believe. The advantage of sync is that it flushes the read/write operation out on the bus *now*. When I'm writing to hardware to control the hardware, *now* is what I want.
The eieio merely guarantees the preceding operations will go out on the bus before following bus operations. The preceding operations could be hung up in the bus interface unit *indefinitely* if no "following" bus operations occur. This is an unlikely occurrence, but could be the result of running out of cache in a tight loop.
For instance, if you do a write to a hardware register, a eieio, and then wait in a tight loop until time goes by (reading the decrementer register) followed by another write, the write1/delay/write2 sequence could actually be delay/write1/write2. Note that the order of the writes on the bus is correct per the eieio, but it isn't what the hardware *needed*.
Illustration - what you did in code and intended to occur: write1 (eieio) delay write2 (eieio)
What actually may happen on the bus: delay write1 (eieio) write2 (eieio)
By using a sync, you guarantee the write isn't delayed: write1 (sync) delay write2 (sync)
Disclaimer: the above explanation is from my fertile imagination. It may or may not happen and *will* happen differently on different PPC processors. For instance, the eieio instruction is actually a NOP in the 603e core because it doesn't reorder bus operations, but it *does* have a BIU that can buffer and delay the bus operations, causing the above timing problem.
I contend using the "sync" instruction will always work correctly and the "eieio" instruction is at best a false economy and at worst a lot of very difficult, mysterious bugs to find, so I'm in agreement with the linux in_bex/out_bex recommendation.
Side note: I don't know if I communicated it properly, but when you see "eieio ; sync" or "sync ; eieio", you know the author of that code doesn't understand sync and eieio. "isync ; sync" is occasionally a valid combination, but I don't believe it is necessary other than when called for by the Users Manual in conjunction with writing to special purpose registers.
Best regards, gvb

In message 48188584.3060109@ge.com you wrote:
The solution in the PowerPC world is to add a "eieio" or "sync" instruction[1] appropriately[2], which prevents the bus interface unit from reordering the memory operations.
No, the solution for ALL architectures is to use the appropriate accessor macros which take care of such things (and probably contain calls like sync or similar).
Best regards,
Wolfgang Denk

Wolfgang Denk wrote:
In message 48188584.3060109@ge.com you wrote:
The solution in the PowerPC world is to add a "eieio" or "sync" instruction[1] appropriately[2], which prevents the bus interface unit from reordering the memory operations.
No, the solution for ALL architectures is to use the appropriate accessor macros which take care of such things (and probably contain calls like sync or similar).
Best regards,
Wolfgang Denk
Agreed.
If my theory is correct, Adrian has turned up a problem with the ARM configuration that apparently only shows up for the PXA250. Or I'm blowing smoke - nothing to see, move along.
The hairs on the back of my neck say it is a sync/bus interface unit problem.
I don't know if it is significant, not being very intimately familiar with ARM (and, in particular, the PXA250), but my quick grep turns up ./include/asm-arm/io.h and a quick browsing shows no syncs. The hairs on the back of my neck are starting to ache. ;-)
static inline void sync(void) { }
#define __arch_getb(a) (*(volatile unsigned char *)(a))
gvb
participants (4)
-
Jerry Van Baren
-
Joakim Tjernlund
-
Scott Wood
-
Wolfgang Denk