Re: [U-Boot-Users] [PATCH] ppc4xx: Refactor ECC POST for AMCC Denali core

14 Jan 2008

      Jerry Van Baren wrote:
...
Larry Johnson wrote:
...
The ECC POST reported intermittent failures running after power-up on
the Korat PPC440EPx board.  Even when the test passed, the debugging
output occasionally reported additional unexpected ECC errors.
This refactoring had two main objectives: (1) minimize the code executed
with ECC enabled during the tests, and (2) add more checking of the
results so any unexpected ECC errors would cause the test to fail.
So far, the refactored test has not reported any intermittent failures.
Further, synchronization instructions appear no longer to be require, so
have been removed.  If intermittent failures do occur in the future, the
refactoring should make the causes easier to identify.
WHOOP, WHOOP, WHOOP, red alert!  "[S]ynchronization instructions appear 
no longer to be require[d], so have been removed".
Synchronization instructions either *ARE* required or *ARE NOT* 
required, there is no "appear".  When sync instructions appear to not be 
required, but actually are required, that is when really obscure bugs 
start happening in the dead of winter off the coast of Alaska / Siberia 
and your boss asks you if you have warm clothes.
I am not familiar with the 4xx family or the PowerPC core that is used 
in it but...
[snip]
...
-static int test_ecc(unsigned long ecc_addr)
+static int test_ecc(uint32_t ecc_addr)
 {

unsigned long value;
volatile unsigned *const ecc_mem = (volatile unsigned *) ecc_addr;
int pret;

uint32_t value;
volatile uint32_t *const ecc_mem = (volatile uint32_t *)ecc_addr;
int ret = 0;

sync();
eieio();
WATCHDOG_RESET();

The combination of "sync" and "eieio" is a strong indication of someone 
sprinkling pixie dust rather than understanding the problem.
"Sync" forces all pending I/O (read/write) operations to be completed 
all the way to memory/hardware register before the instruction 
continues.  Sync guarantees *WHEN* the I/O will complete: NOW.  This is 
a big hammer: it can cause a significant performance hit because it 
stalls the processor BUT it is guaranteed effective (except for the 
places that need an both an isync and a sync combination - thankfully, I 
believe that is only needed in special cases when playing with the 
processor's control registers).
"Eieio" (enforce in-order execution of I/O) is a barrier that says all 
I/O that goes before it must be completed before any I/O that goes after 
it is started.  It *DOES NOT* guarantee *WHEN* the preceding 
reads/writes will be completed.  Theoretically, the bus interface unit 
(BIU) could hold the whole shootin' match for 10 minutes before it does 
the preceding I/O followed by the succeeding I/O.  Eieio is much less 
draconian to the processor than sync (which is why eieios are preferred) 
but an eieio may or may not cause the intended synchronizing result if 
you are relying on a write or read causing the proper effect *NOW*. Note 
that eieios are NOPs to processor cores that don't reorder I/O.
Some PowerPC cores (e.g. the 74xx family) can reorder reads and writes 
in the bus interface unit (some cores, such as the 603e, do *not* 
reorder reads and writes).  This is a performance enhancement... writes 
(generally) are non-blocking to the processor core where a read causes 
the processor to have to wait for the data (which cascades into pipeline 
stalls and performance hits).  The bus is a highly oversubscribed 
resource (core speed / bus speed can be 8x or more).  As a result, you 
want to get reads done ASAP (if possible) and thus it is beneficial to 
move a read ahead of a write.
As you should have picked up by now, a sync (forcing all I/O to 
complete) followed by eieio is silly - the eieio is superfluous.  Seeing 
syncs/isyncs/eieios sprinkled in code is an indication that the author 
didn't understand what was going on and, as a result, kept hitting the 
problem with a bigger and bigger hammer until it appeared to have gone 
away.
Besides read/write reordering problems, the bus interface unit (BIU) can 
"short circuit" a read that follows a write to the same address.  This 
is very likely to be implemented in a given core - it offers a very good 
speed up traded off against a modest increase in complexity to the BIU. 
 The problem is (for instance), if you configure your EDC to store an 
invalid EDC flag, do a write to a test location (which gets held in the 
BIU because the bus is busy), followed by a read of the test location 
(expecting to see an EDC failure), the BIU could return the queued *but 
unwritten* write value.
OK, enough lecturing...
Repeated disclaimer: What I write here is applicable for more complex 
PowerPC implementations.  It may not be applicable for the particular 
4xx core you are running on.  I am not familiar with the 4xx core.
The reason sync/eieio is very likely VITAL in a EDC test is that...

The EDC is being reconfigured in a way that can cause latent EDC

faults.  If a write - for instance a save of a register on the stack - 
gets deferred inadvertently until _after_ the EDC hardware is configured 
to test an error, you could end up with the register save performed with 
inadvertently screwed up EDC.  As a result, when your code executes the 
return postlog (popping the register off the stack) you will get a 
totally unexpected EDC error.

Even if an eieio is used properly, the (EDC reconfiguration, write,

read, EDC fixup) sequence may occur in the right order, but it may occur 
*way later* than you expected which could cause an EDC exception way 
later in the code than you expected.  This would lead to very flaky 
results, unexpected EDC failures, etc.  Hmmmmm.
While the previous scenarios are worst cases, improper sync discipline 
can cause test failures as well.  In fact, it is actually more likely to 
cause problems with the test that the worst case scenario.  For 
instance, if the BIU holds a write and short-circuits subsequent reads, 
you may *think* you are testing EDC but, if the BIU has the write queued 
and the read comes from the BIU rather than actual memory, the BIU will 
inadvertently short circuit your test as well.
By the way, if interrupts are enabled during this time.......... 
(shudders) oooh, good choice to run polled, Dan/Wolfgang!
[major snip]
HTH,
gvb
Yes, it does help.  Thanks, Jerry.
When I first modified the (then) LWMON5 ECC POST to run on Korat and
other 440EPx boards, Stefan urged me to replace the memory accesses
using volatile pointers with accesses via "in_be32()" et al.  As I
understood it, this should have eliminated the need for any external
synchronization.  However, after checking the PPC440 documetation, I
now believe that there should be a "sync" (actually, an "msync", which
is the same opcode) between the memory access and access to the SDRAM-
controller registers.
(From what I can tell, "in_be32()" et al. do not not force completion
of the storage access before returning.  Is this correct?)
Stefan, please hold of on this patch, as I expect to be resubmitting it
soon. :-)
Best regards,
Larry