Re: [U-Boot] [PATCH 01/21] Define new system_restart() and emergency_restart()

14 Mar 2011


      Dear "Moffett, Kyle D",
In message 613C8F89-3CE5-4C28-A48E-D5C3E8143A4C@boeing.com you wrote:
...
On our boards, when the "reset" button is pressed in hardware, both
processor modules on the board and all the attached hardware reset at
the same time.
OK.  So a sane design would provide a way for both of the processors
to do the same, for example by toggeling some GPIO or similar.
...
If just *one* of the 2 CPUs triggers the reset then only *some* of
the attached hardware will be properly reset due to a hardware
errata, and as a result the board will sometimes hang or corrupt DMA
transfers to the SSDs shortly after reset.
...
...
Yes, it's a royal pain, but we're stuck with this hardware for the
time being, and if the board can't communicate then it might as well
hang() anyways.
Do you agree that this is a highly board-specific problem (I would
call it a hardware bug, but I don't insist you agree on that term),
and while there is a the need for you to work around such behaviour
there is little or no reason to do this, or anything like that, in
common code ?
...
...
And if there are more things that could be done to provide a "better"
reset, then why should we not always do these?
If the board is in a panic() state it may well have still-running DMA
transfers (such as USB URBs), or be in the middle of writing to
FLASH.
The same (at least having USB or other drivers still being enabled,
and USB writing it's SOF counters to RAM) can happen for any call to
the reset() function.  I see no reason for assuming there would be
better or worse conditions to perform a reset.
...
Performing a jump to early-boot code which is only ever tested when
everything is OK and devices are properly initialized is a great way
to cause data corruption.
If there is a software way to prevent such issues, then these steps
should always be performed.
...
I know for a fact that our boards would rather hang forever than try
to reset without cooperation from the other CPU.
As mentioned above, this is a board specific issue that should not
influence common code design.
...
...
...
While I was going through the hooks I noticed that several of them were
explicitly NOT safe if the board was in the middle of a panic() for whatever
Can you please peovide some specific examples?  I don't understand what
you are talking about.
Ok, using the ppmc7xx board as an example:
    /* Disable and invalidate cache */
    icache_disable();
    dcache_disable();

    /* Jump to cold reset point (in RAM) */
    _start();

    /* Should never get here */
    while(1)
            ;


This board uses the EEPRO100 driver, which appears to set up
statically allocated TX and RX rings which the device performs DMA
to/from.
If this board starts receiving packets and then panic()s, it will
disable address translation and immediately re-relocate U-Boot into
RAM, then zero the BSS. If the network card tries to receive a packet
after BSS is zeroed, it will read a packet buffer address of
(probably) 0x0 from the RX ring and promptly overwrite part of
U-Boot's memory at that address.
Agreed.  So this should be fixed.  One clean way to fix it would be to
help improving the driver model for U-Boot (read: create one) and
making sure drivers get deinitialized in such a case.
...
Since the panic() path is so infrequently used and tested, it's
better to be safe and hang() on the boards which do not have a
reliable hardware-level reset than it is to cause undefined behavior
or potentially corrupt data.
I disagree.  Instead of adding somewhat obscure alternate code paths
(which get tested even less frequently) we should focus oin fixing
such problems where we run into them.
Best regards,
Wolfgang Denk
-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Microsoft Multitasking:
                     several applications can crash at the same time.