[U-Boot-Users] intended behavior of bootm

Hi,
I am wondering if bootm behaves correctly on CRC errors in kernel and/or ramdisk images. This is what I observed:
1) I loaded a Linux kernel into RAM at 0x200000 on a 405 system. I loaded an initial ramdisk images into RAM at address 0x300000. Now 'bootm 200000 300000' boots my system correctly.
2) Same loading as above. But I made the kernel image CRC check fail (mw 220000 12345678). I get: ... Verifying Checksum ... Bad Data CRC ERROR: can't get kernel image! =>
That's ok.
3) Same loading as above. But I make the ramdisk CRC check fail (mw 320000 12345678). I get: ## Booting kernel from Legacy Image at 00200000 ... ... ## Loading init Ramdisk from Legacy Image at 00300000 ... ... Verifying Checksum ... Bad Data CRC <system reset> U-Boot 1.3.2-00450-g77dd47f (Apr 21 2008 - 14:43:23)
Hmm, I expected the same behavior as for a corrupted kernel image. So what should be the correct behavior? I would like to get back to the prompt on any CRC error. So is this a bug?
Matthias

Hi,
after going through the boom code I found out, that setting the 'autostart' variable to 'no' brings me a little closer to what I want. But finally I end up in the enable_interrupts() at the very end of do_bootm(). This freezes my system. The reason for this is the Linux kernel image that is loaded to address 0 and that overwrites the vector table. So reenabling the interrupts in U-Boot with Linux interrupt table is a bad idea.
So what's the best idea to fix this? I could copy the vector table onto the stack in do_bootm() and copy it back just before reenabling the interrupts.
Any better idea?
Matthias
On Monday 21 April 2008 15:09, Matthias Fuchs wrote:
Hi,
I am wondering if bootm behaves correctly on CRC errors in kernel and/or ramdisk images. This is what I observed:
- I loaded a Linux kernel into RAM at 0x200000 on a 405 system. I loaded an initial ramdisk images
into RAM at address 0x300000. Now 'bootm 200000 300000' boots my system correctly.
- Same loading as above. But I made the kernel image CRC check fail (mw 220000 12345678).
I get: ... Verifying Checksum ... Bad Data CRC ERROR: can't get kernel image! =>
That's ok.
- Same loading as above. But I make the ramdisk CRC check fail (mw 320000 12345678).
I get: ## Booting kernel from Legacy Image at 00200000 ... ... ## Loading init Ramdisk from Legacy Image at 00300000 ... ... Verifying Checksum ... Bad Data CRC
<system reset> U-Boot 1.3.2-00450-g77dd47f (Apr 21 2008 - 14:43:23)
Hmm, I expected the same behavior as for a corrupted kernel image. So what should be the correct behavior? I would like to get back to the prompt on any CRC error. So is this a bug?
Matthias

Matthias Fuchs wrote:
Hi,
after going through the boom code I found out, that setting the 'autostart' variable to 'no' brings me a little closer to what I want. But finally I end up in the enable_interrupts() at the very end of do_bootm(). This freezes my system. The reason for this is the Linux kernel image that is loaded to address 0 and that overwrites the vector table. So reenabling the interrupts in U-Boot with Linux interrupt table is a bad idea.
No, having your (u-boot) interrupt go off while booting linux is a bad idea.
Which interrupt is going off? Why is it going off (why isn't the hardware put into a quiescent state)?
So what's the best idea to fix this? I could copy the vector table onto the stack in do_bootm() and copy it back just before reenabling the interrupts.
NO NO NO.
Any better idea?
Matthias
That a u-boot initialized interrupt is occurring is wrong and needs to be fixed. * Traditionally, u-boot does not use interrupts for anything, thus this isn't a problem.
* Proper hardware and device driver convention is that the hardware must be quiescent when linux is started and the linux device driver must (re)configure that hardware the way it wants/needs. Obviously, this is probably a 95% rule (console I/O, memory initialization, some others may violate this rule for practical reasons).
* If your u-boot enables interrupt(s), you MUST disable the interrupt source before starting linux. There is NO graceful way of getting linux to handle an interrupt that was a result of u-boot's running. Starting linux with interrupts disabled is not a good solution - you may get lucky but leaving an active interrupt source is a dangerous game. At best, it is a race condition that you may happen to win today.
Best regards, gvb

Hi Jerry,
On Monday 21 April 2008 17:16, Jerry Van Baren wrote:
Matthias Fuchs wrote:
Hi,
after going through the boom code I found out, that setting the 'autostart' variable to 'no' brings me a little closer to what I want. But finally I end up in the enable_interrupts() at the very end of do_bootm(). This freezes my system. The reason for this is the Linux kernel image that is loaded to address 0 and that overwrites the vector table. So reenabling the interrupts in U-Boot with Linux interrupt table is a bad idea.
No, having your (u-boot) interrupt go off while booting linux is a bad idea.
U-Boot calls disable_interrupt() in do_bootm(). That's fact.
Which interrupt is going off? Why is it going off (why isn't the hardware put into a quiescent state)?
So what's the best idea to fix this? I could copy the vector table onto the stack in do_bootm() and copy it back just before reenabling the interrupts.
NO NO NO.
At least this works :-)
Any better idea?
Matthias
That a u-boot initialized interrupt is occurring is wrong and needs to be fixed.
- Traditionally, u-boot does not use interrupts for anything, thus this
isn't a problem.
- Proper hardware and device driver convention is that the hardware must
be quiescent when linux is started and the linux device driver must (re)configure that hardware the way it wants/needs. Obviously, this is probably a 95% rule (console I/O, memory initialization, some others may violate this rule for practical reasons).
- If your u-boot enables interrupt(s), you MUST disable the interrupt
source before starting linux. There is NO graceful way of getting linux to handle an interrupt that was a result of u-boot's running. Starting linux with interrupts disabled is not a good solution - you may get lucky but leaving an active interrupt source is a dangerous game. At best, it is a race condition that you may happen to win today.
So this means that U-Boot calling disable_interrupts before booting Linux (see do_bootm) is correct. Later my the kernel images is loaded at address 0. This overwrites all U-Boot vectors in the first 16k of RAM. So when after the kernel is loaded to address 0 and the ramdisk CRC checking failed to control is to be passed back to U-Boot it sees a mixed up vector table. I think the only ways to fix this is to save the table (as I did for testing) or check the ramdisk images before uncompressing the kernel at address 0.
Except from that I just noticed that 'autostart=no' does not help me, because it completely disables booting the kernel from bootm.
So how can I achive this:
bootm $(kernel_addr_in_flash) $(randisk_addr_in_flash); run load_images_from_usb_to_ram; bootm $(kernel_addr_in_ram) $(ramdisk_addr_in_ram)
So the the initial bootm fails because of invalid images, U-Boot should load images from a USB media and start them.
Matthias

Matthias Fuchs wrote:
Hi Jerry,
On Monday 21 April 2008 17:16, Jerry Van Baren wrote:
Matthias Fuchs wrote:
Hi,
after going through the boom code I found out, that setting the 'autostart' variable to 'no' brings me a little closer to what I want. But finally I end up in the enable_interrupts() at the very end of do_bootm(). This freezes my system. The reason for this is the Linux kernel image that is loaded to address 0 and that overwrites the vector table. So reenabling the interrupts in U-Boot with Linux interrupt table is a bad idea.
No, having your (u-boot) interrupt go off while booting linux is a bad idea.
U-Boot calls disable_interrupt() in do_bootm(). That's fact.
Which interrupt is going off? Why is it going off (why isn't the hardware put into a quiescent state)?
So what's the best idea to fix this? I could copy the vector table onto the stack in do_bootm() and copy it back just before reenabling the interrupts.
NO NO NO.
At least this works :-)
At least once. Today. :-/
Any better idea?
Matthias
That a u-boot initialized interrupt is occurring is wrong and needs to be fixed.
- Traditionally, u-boot does not use interrupts for anything, thus this
isn't a problem.
- Proper hardware and device driver convention is that the hardware must
be quiescent when linux is started and the linux device driver must (re)configure that hardware the way it wants/needs. Obviously, this is probably a 95% rule (console I/O, memory initialization, some others may violate this rule for practical reasons).
- If your u-boot enables interrupt(s), you MUST disable the interrupt
source before starting linux. There is NO graceful way of getting linux to handle an interrupt that was a result of u-boot's running. Starting linux with interrupts disabled is not a good solution - you may get lucky but leaving an active interrupt source is a dangerous game. At best, it is a race condition that you may happen to win today.
So this means that U-Boot calling disable_interrupts before booting Linux (see do_bootm) is correct. Later my the kernel images is loaded at address 0. This overwrites all U-Boot vectors in the first 16k of RAM. So when after the kernel is loaded to address 0 and the ramdisk CRC checking failed to control is to be passed back to U-Boot it sees a mixed up vector table. I think the only ways to fix this is to save the table (as I did for testing)
That is the way of INsanity.
or check the ramdisk images before uncompressing the kernel at address 0.
That is the way of sanity.
I missed a piece of the puzzle... the problem isn't an interrupt going off in linux-land, the problem is that linux failed to run (CRC error - implies a bad load) and you are trying to recover without resetting the system.
That isn't reliable. It is possible that it could be done, but I wouldn't bet on it being reliable, at least not long term. I would anticipate that to be a very fragile solution - eventually the linux kernel will do *something* different/unexpected before aborting and your recovery mechanism will blow up.
Our current solution revolves around verifying the boot image and aborting the launch *before* attempting to launch it, before it becomes impossible to recover.
In bootm, you will see that there is a "point of no return" corresponding to replacing the interrupt vectors. After the "point of no return" we don't attempt to return from the failed bootm, but rather reset the hardware to recover. You are trying to go *way beyond* the "point of no return" (all the way to some indeterminate point in the linux boot process) and still trying to recover without a reset. Ouch. You don't know (in general) *what* the linux initialization did (it is busy scribbling on memory, if nothing else), so it is impossible to guarantee that what it did is reversible.
Except from that I just noticed that 'autostart=no' does not help me, because it completely disables booting the kernel from bootm.
So how can I achive this:
bootm $(kernel_addr_in_flash) $(randisk_addr_in_flash); run load_images_from_usb_to_ram; bootm $(kernel_addr_in_ram) $(ramdisk_addr_in_ram)
So the the initial bootm fails because of invalid images, U-Boot should load images from a USB media and start them.
The current u-boot has ways to detect invalid images before leaping. The way of sanity is to identify invalid images before leaping, and filling in holes in our detection, as necessary. The new boot image format may also be helpful here.
Matthias
Best regards, gvb

Dear Matthias,
in message 200804211509.43558.matthias.fuchs@esd-electronics.com you wrote:
I am wondering if bootm behaves correctly on CRC errors in kernel and/or ramdisk images. This is what I observed:
Most has already been said in previous replies, so here just a summary of the situation:
- Same loading as above. But I make the ramdisk CRC check fail (mw 320000 12345678).
I get: ## Booting kernel from Legacy Image at 00200000 ... ... ## Loading init Ramdisk from Legacy Image at 00300000 ... ... Verifying Checksum ... Bad Data CRC
<system reset> U-Boot 1.3.2-00450-g77dd47f (Apr 21 2008 - 14:43:23)
Hmm, I expected the same behavior as for a corrupted kernel image.
This expectation is incorrect.
So what should be the correct behavior? I would like to get back to the prompt on any CRC error. So is this a bug?
No, it is not a bug. Kernel image and ramdisk image get processed sequentiually. As soon as you uncompress and copy the kernel image to it's load address (typically 0x0000 for PowerPC), it will overwrite the exception vectors used by U-Boot. The next interrupt (for example timer) would then kill kill you. That's why there is a point of no return just before we start uncompressing / loading the kernel image.
Any errors after this point can only be resolved by a reset.
That's intentional and documented. There are no intentions to change this behaviour.
Best regards,
Wolfgang Denk

Thanks for Jerry's and your reply. I see that my expectation was incorrect and I didn't take the words 'point of no return' that serious.
Now I have to find a (simple) solution to solve my problem:
Typically the 405 board boots from onboard flash. Because of historic reason there is a kernel and a ramdisk image (not a multi image and nothing that is aware of any new image format). These images cannot be changed. When one of these images either one of them or both is corrupted, U-Boot should try to load both of them from a usb mass storage. So what's the best way to do so?
1) Make bootm fail when any image has a CRC error? 2) Add a new command to check images and decide on the result 3) ???
Any idea? I think the idea behind this is clear. When images A are not ok boot images B.
good night Matthias
On Monday 21 April 2008 21:19:22 Wolfgang Denk wrote:
Dear Matthias,
in message 200804211509.43558.matthias.fuchs@esd-electronics.com you
wrote:
I am wondering if bootm behaves correctly on CRC errors in kernel and/or ramdisk images. This is what I observed:
Most has already been said in previous replies, so here just a summary
of the situation:
- Same loading as above. But I make the ramdisk CRC check fail (mw
320000 12345678). I get: ## Booting kernel from Legacy Image at 00200000 ... ... ## Loading init Ramdisk from Legacy Image at 00300000 ... ... Verifying Checksum ... Bad Data CRC
<system reset> U-Boot 1.3.2-00450-g77dd47f (Apr 21 2008 - 14:43:23)
Hmm, I expected the same behavior as for a corrupted kernel image.
This expectation is incorrect.
So what should be the correct behavior? I would like to get back to the prompt on any CRC error. So is this a bug?
No, it is not a bug. Kernel image and ramdisk image get processed sequentiually. As soon as you uncompress and copy the kernel image to it's load address (typically 0x0000 for PowerPC), it will overwrite the exception vectors used by U-Boot. The next interrupt (for example timer) would then kill kill you. That's why there is a point of no return just before we start uncompressing / loading the kernel image.
Any errors after this point can only be resolved by a reset.
That's intentional and documented. There are no intentions to change this behaviour.
Best regards,
Wolfgang Denk

In message 200804212302.30762.matthias.fuchs@esd-electronics.com you wrote:
Now I have to find a (simple) solution to solve my problem:
Typically the 405 board boots from onboard flash. Because of historic reason there is a kernel and a ramdisk image (not a multi image and nothing that is aware of any new image format). These images cannot be changed. When one of these images either one of them or both is corrupted, U-Boot should try to load both of them from a usb mass storage. So what's the best way to do so?
The key question here is your definition of "corrupted".
If reliability is an issue, you want to implement (1) support for a hardware watchdog combined with (2) support for a boot counter. Then you set "bootlimit" to a reasonable value and "altbootcmd" to the command required to load and boot from USB.
Such a setup will be very robust and handle even situations when the images look good (checksums are OK etc.) but fail to work (for example, because of buggy binaries or libraries were included, config files got corrupted, etc.).
- Make bootm fail when any image has a CRC error?
This is trivial to do. Remember that you can always use "imi" to check images; something like
=> imi $kernel_addr && imi $ramdisk_addr && bootm $kernel_addr $ramdisk_addr
would do what you want.
The new image format allows for even fancier methods.
Or implement a boot counter and let the board reset on corrupt images and then use "altbootcmd".
- Add a new command to check images and decide on the result
Not needed. "iminfo" already does that.
Any idea? I think the idea behind this is clear. When images A are not ok boot images B.
As mentioned above, the trick question is how you define when an image is OK.
Best regards,
Wolfgang Denk

Hi Wolfgang,
thanks for your reply. That's the kind of thing I wanted to hear. Now I will start playing around ;-)
Matthias
On Tuesday 22 April 2008 22:49, Wolfgang Denk wrote:
In message 200804212302.30762.matthias.fuchs@esd-electronics.com you wrote:
Now I have to find a (simple) solution to solve my problem:
Typically the 405 board boots from onboard flash. Because of historic reason there is a kernel and a ramdisk image (not a multi image and nothing that is aware of any new image format). These images cannot be changed. When one of these images either one of them or both is corrupted, U-Boot should try to load both of them from a usb mass storage. So what's the best way to do so?
The key question here is your definition of "corrupted".
If reliability is an issue, you want to implement (1) support for a hardware watchdog combined with (2) support for a boot counter. Then you set "bootlimit" to a reasonable value and "altbootcmd" to the command required to load and boot from USB.
Such a setup will be very robust and handle even situations when the images look good (checksums are OK etc.) but fail to work (for example, because of buggy binaries or libraries were included, config files got corrupted, etc.).
- Make bootm fail when any image has a CRC error?
This is trivial to do. Remember that you can always use "imi" to check images; something like
=> imi $kernel_addr && imi $ramdisk_addr && bootm $kernel_addr $ramdisk_addr
would do what you want.
The new image format allows for even fancier methods.
Or implement a boot counter and let the board reset on corrupt images and then use "altbootcmd".
- Add a new command to check images and decide on the result
Not needed. "iminfo" already does that.
Any idea? I think the idea behind this is clear. When images A are not ok boot images B.
As mentioned above, the trick question is how you define when an image is OK.
Best regards,
Wolfgang Denk
participants (3)
-
Jerry Van Baren
-
Matthias Fuchs
-
Wolfgang Denk