[U-Boot-Users] intended behavior of bootm

newer
[U-Boot-Users] [PATCH 1/2] Add the...

older
Re: [U-Boot-Users] [PATCH] ppc:...

Matthias Fuchs

21 Apr 2008 21 Apr '08

3:09 p.m.

Hi,

I am wondering if bootm behaves correctly on CRC errors in kernel and/or ramdisk images. This is what I observed:

1) I loaded a Linux kernel into RAM at 0x200000 on a 405 system. I loaded an initial ramdisk images into RAM at address 0x300000. Now 'bootm 200000 300000' boots my system correctly.

2) Same loading as above. But I made the kernel image CRC check fail (mw 220000 12345678). I get: ... Verifying Checksum ... Bad Data CRC ERROR: can't get kernel image! =>

That's ok.

3) Same loading as above. But I make the ramdisk CRC check fail (mw 320000 12345678). I get: ## Booting kernel from Legacy Image at 00200000 ... ... ## Loading init Ramdisk from Legacy Image at 00300000 ... ... Verifying Checksum ... Bad Data CRC <system reset> U-Boot 1.3.2-00450-g77dd47f (Apr 21 2008 - 14:43:23)

Hmm, I expected the same behavior as for a corrupted kernel image. So what should be the correct behavior? I would like to get back to the prompt on any CRC error. So is this a bug?

Matthias

Show replies by date

Matthias Fuchs

21 Apr 21 Apr

4:58 p.m.

Hi,

after going through the boom code I found out, that setting the 'autostart' variable to 'no' brings me a little closer to what I want. But finally I end up in the enable_interrupts() at the very end of do_bootm(). This freezes my system. The reason for this is the Linux kernel image that is loaded to address 0 and that overwrites the vector table. So reenabling the interrupts in U-Boot with Linux interrupt table is a bad idea.

So what's the best idea to fix this? I could copy the vector table onto the stack in do_bootm() and copy it back just before reenabling the interrupts.

Any better idea?

Matthias

On Monday 21 April 2008 15:09, Matthias Fuchs wrote:

...

Hi,

I am wondering if bootm behaves correctly on CRC errors in kernel and/or ramdisk images. This is what I observed:

I loaded a Linux kernel into RAM at 0x200000 on a 405 system. I loaded an initial ramdisk images

into RAM at address 0x300000. Now 'bootm 200000 300000' boots my system correctly.

Same loading as above. But I made the kernel image CRC check fail (mw 220000 12345678).

I get: ... Verifying Checksum ... Bad Data CRC ERROR: can't get kernel image! =>

That's ok.

Same loading as above. But I make the ramdisk CRC check fail (mw 320000 12345678).

I get: ## Booting kernel from Legacy Image at 00200000 ... ... ## Loading init Ramdisk from Legacy Image at 00300000 ... ... Verifying Checksum ... Bad Data CRC

<system reset> U-Boot 1.3.2-00450-g77dd47f (Apr 21 2008 - 14:43:23)

Hmm, I expected the same behavior as for a corrupted kernel image. So what should be the correct behavior? I would like to get back to the prompt on any CRC error. So is this a bug?

Matthias

Jerry Van Baren

5:16 p.m.

Matthias Fuchs wrote:

...

Hi,

after going through the boom code I found out, that setting the 'autostart' variable to 'no' brings me a little closer to what I want. But finally I end up in the enable_interrupts() at the very end of do_bootm(). This freezes my system. The reason for this is the Linux kernel image that is loaded to address 0 and that overwrites the vector table. So reenabling the interrupts in U-Boot with Linux interrupt table is a bad idea.

No, having your (u-boot) interrupt go off while booting linux is a bad idea.

Which interrupt is going off? Why is it going off (why isn't the hardware put into a quiescent state)?

...

So what's the best idea to fix this? I could copy the vector table onto the stack in do_bootm() and copy it back just before reenabling the interrupts.

NO NO NO.

...

Any better idea?

Matthias

That a u-boot initialized interrupt is occurring is wrong and needs to be fixed. * Traditionally, u-boot does not use interrupts for anything, thus this isn't a problem.

* Proper hardware and device driver convention is that the hardware must be quiescent when linux is started and the linux device driver must (re)configure that hardware the way it wants/needs. Obviously, this is probably a 95% rule (console I/O, memory initialization, some others may violate this rule for practical reasons).

* If your u-boot enables interrupt(s), you MUST disable the interrupt source before starting linux. There is NO graceful way of getting linux to handle an interrupt that was a result of u-boot's running. Starting linux with interrupts disabled is not a good solution - you may get lucky but leaving an active interrupt source is a dangerous game. At best, it is a race condition that you may happen to win today.

Best regards, gvb

Matthias Fuchs

5:43 p.m.

Hi Jerry,

On Monday 21 April 2008 17:16, Jerry Van Baren wrote:

...

Matthias Fuchs wrote:

...
Hi,

after going through the boom code I found out, that setting the 'autostart' variable to 'no' brings me a little closer to what I want. But finally I end up in the enable_interrupts() at the very end of do_bootm(). This freezes my system. The reason for this is the Linux kernel image that is loaded to address 0 and that overwrites the vector table. So reenabling the interrupts in U-Boot with Linux interrupt table is a bad idea.

No, having your (u-boot) interrupt go off while booting linux is a bad idea.

U-Boot calls disable_interrupt() in do_bootm(). That's fact.

...

Which interrupt is going off? Why is it going off (why isn't the hardware put into a quiescent state)?

...
So what's the best idea to fix this? I could copy the vector table onto the stack in do_bootm() and copy it back just before reenabling the interrupts.

NO NO NO.

At least this works :-)

...

...
Any better idea?

Matthias

That a u-boot initialized interrupt is occurring is wrong and needs to be fixed.

Traditionally, u-boot does not use interrupts for anything, thus this

isn't a problem.

Proper hardware and device driver convention is that the hardware must

be quiescent when linux is started and the linux device driver must (re)configure that hardware the way it wants/needs. Obviously, this is probably a 95% rule (console I/O, memory initialization, some others may violate this rule for practical reasons).

If your u-boot enables interrupt(s), you MUST disable the interrupt

source before starting linux. There is NO graceful way of getting linux to handle an interrupt that was a result of u-boot's running. Starting linux with interrupts disabled is not a good solution - you may get lucky but leaving an active interrupt source is a dangerous game. At best, it is a race condition that you may happen to win today.

So this means that U-Boot calling disable_interrupts before booting Linux (see do_bootm) is correct. Later my the kernel images is loaded at address 0. This overwrites all U-Boot vectors in the first 16k of RAM. So when after the kernel is loaded to address 0 and the ramdisk CRC checking failed to control is to be passed back to U-Boot it sees a mixed up vector table. I think the only ways to fix this is to save the table (as I did for testing) or check the ramdisk images before uncompressing the kernel at address 0.

Except from that I just noticed that 'autostart=no' does not help me, because it completely disables booting the kernel from bootm.

So how can I achive this:

bootm $(kernel_addr_in_flash) $(randisk_addr_in_flash); run load_images_from_usb_to_ram; bootm $(kernel_addr_in_ram) $(ramdisk_addr_in_ram)

So the the initial bootm fails because of invalid images, U-Boot should load images from a USB media and start them.

Matthias

Jerry Van Baren

6:28 p.m.

Matthias Fuchs wrote:

...

Hi Jerry,

On Monday 21 April 2008 17:16, Jerry Van Baren wrote:

...
Matthias Fuchs wrote:

...
Hi,

after going through the boom code I found out, that setting the 'autostart' variable to 'no' brings me a little closer to what I want. But finally I end up in the enable_interrupts() at the very end of do_bootm(). This freezes my system. The reason for this is the Linux kernel image that is loaded to address 0 and that overwrites the vector table. So reenabling the interrupts in U-Boot with Linux interrupt table is a bad idea.

No, having your (u-boot) interrupt go off while booting linux is a bad idea.

U-Boot calls disable_interrupt() in do_bootm(). That's fact.

...
Which interrupt is going off? Why is it going off (why isn't the hardware put into a quiescent state)?

...
So what's the best idea to fix this? I could copy the vector table onto the stack in do_bootm() and copy it back just before reenabling the interrupts.

NO NO NO.

At least this works :-)

At least once. Today. :-/

...

...
...
Any better idea?

Matthias

That a u-boot initialized interrupt is occurring is wrong and needs to be fixed.

Traditionally, u-boot does not use interrupts for anything, thus this

isn't a problem.

Proper hardware and device driver convention is that the hardware must

be quiescent when linux is started and the linux device driver must (re)configure that hardware the way it wants/needs. Obviously, this is probably a 95% rule (console I/O, memory initialization, some others may violate this rule for practical reasons).

If your u-boot enables interrupt(s), you MUST disable the interrupt

source before starting linux. There is NO graceful way of getting linux to handle an interrupt that was a result of u-boot's running. Starting linux with interrupts disabled is not a good solution - you may get lucky but leaving an active interrupt source is a dangerous game. At best, it is a race condition that you may happen to win today.

So this means that U-Boot calling disable_interrupts before booting Linux (see do_bootm) is correct. Later my the kernel images is loaded at address 0. This overwrites all U-Boot vectors in the first 16k of RAM. So when after the kernel is loaded to address 0 and the ramdisk CRC checking failed to control is to be passed back to U-Boot it sees a mixed up vector table. I think the only ways to fix this is to save the table (as I did for testing)

That is the way of INsanity.

...

or check the ramdisk images before uncompressing the kernel at address 0.

That is the way of sanity.

I missed a piece of the puzzle... the problem isn't an interrupt going off in linux-land, the problem is that linux failed to run (CRC error - implies a bad load) and you are trying to recover without resetting the system.

That isn't reliable. It is possible that it could be done, but I wouldn't bet on it being reliable, at least not long term. I would anticipate that to be a very fragile solution - eventually the linux kernel will do *something* different/unexpected before aborting and your recovery mechanism will blow up.

Our current solution revolves around verifying the boot image and aborting the launch *before* attempting to launch it, before it becomes impossible to recover.

In bootm, you will see that there is a "point of no return" corresponding to replacing the interrupt vectors. After the "point of no return" we don't attempt to return from the failed bootm, but rather reset the hardware to recover. You are trying to go *way beyond* the "point of no return" (all the way to some indeterminate point in the linux boot process) and still trying to recover without a reset. Ouch. You don't know (in general) *what* the linux initialization did (it is busy scribbling on memory, if nothing else), so it is impossible to guarantee that what it did is reversible.

...

Except from that I just noticed that 'autostart=no' does not help me, because it completely disables booting the kernel from bootm.

So how can I achive this:

bootm $(kernel_addr_in_flash) $(randisk_addr_in_flash); run load_images_from_usb_to_ram; bootm $(kernel_addr_in_ram) $(ramdisk_addr_in_ram)

So the the initial bootm fails because of invalid images, U-Boot should load images from a USB media and start them.

The current u-boot has ways to detect invalid images before leaping. The way of sanity is to identify invalid images before leaping, and filling in holes in our detection, as necessary. The new boot image format may also be helpful here.

...

Matthias

Best regards, gvb

Wolfgang Denk

9:19 p.m.

Dear Matthias,

in message 200804211509.43558.matthias.fuchs@esd-electronics.com you wrote:

...

I am wondering if bootm behaves correctly on CRC errors in kernel and/or ramdisk images. This is what I observed:

Most has already been said in previous replies, so here just a summary of the situation:

...

Same loading as above. But I make the ramdisk CRC check fail (mw 320000 12345678).

I get: ## Booting kernel from Legacy Image at 00200000 ... ... ## Loading init Ramdisk from Legacy Image at 00300000 ... ... Verifying Checksum ... Bad Data CRC

<system reset> U-Boot 1.3.2-00450-g77dd47f (Apr 21 2008 - 14:43:23)

Hmm, I expected the same behavior as for a corrupted kernel image.

This expectation is incorrect.

...

So what should be the correct behavior? I would like to get back to the prompt on any CRC error. So is this a bug?

No, it is not a bug. Kernel image and ramdisk image get processed sequentiually. As soon as you uncompress and copy the kernel image to it's load address (typically 0x0000 for PowerPC), it will overwrite the exception vectors used by U-Boot. The next interrupt (for example timer) would then kill kill you. That's why there is a point of no return just before we start uncompressing / loading the kernel image.

Any errors after this point can only be resolved by a reset.

That's intentional and documented. There are no intentions to change this behaviour.

Best regards,

Wolfgang Denk

-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de Computers are not intelligent. They only think they are.

Matthias Fuchs

11:02 p.m.

Thanks for Jerry's and your reply. I see that my expectation was incorrect and I didn't take the words 'point of no return' that serious.

Now I have to find a (simple) solution to solve my problem:

Typically the 405 board boots from onboard flash. Because of historic reason there is a kernel and a ramdisk image (not a multi image and nothing that is aware of any new image format). These images cannot be changed. When one of these images either one of them or both is corrupted, U-Boot should try to load both of them from a usb mass storage. So what's the best way to do so?

1) Make bootm fail when any image has a CRC error? 2) Add a new command to check images and decide on the result 3) ???

Any idea? I think the idea behind this is clear. When images A are not ok boot images B.

good night Matthias

On Monday 21 April 2008 21:19:22 Wolfgang Denk wrote:

...

Dear Matthias,

in message 200804211509.43558.matthias.fuchs@esd-electronics.com you

wrote:

...

...
I am wondering if bootm behaves correctly on CRC errors in kernel and/or ramdisk images. This is what I observed:

Most has already been said in previous replies, so here just a summary

of the situation:

...

Same loading as above. But I make the ramdisk CRC check fail (mw

320000 12345678). I get: ## Booting kernel from Legacy Image at 00200000 ... ... ## Loading init Ramdisk from Legacy Image at 00300000 ... ... Verifying Checksum ... Bad Data CRC

<system reset> U-Boot 1.3.2-00450-g77dd47f (Apr 21 2008 - 14:43:23)

Hmm, I expected the same behavior as for a corrupted kernel image.

This expectation is incorrect.

...
So what should be the correct behavior? I would like to get back to the prompt on any CRC error. So is this a bug?

No, it is not a bug. Kernel image and ramdisk image get processed sequentiually. As soon as you uncompress and copy the kernel image to it's load address (typically 0x0000 for PowerPC), it will overwrite the exception vectors used by U-Boot. The next interrupt (for example timer) would then kill kill you. That's why there is a point of no return just before we start uncompressing / loading the kernel image.

Any errors after this point can only be resolved by a reset.

That's intentional and documented. There are no intentions to change this behaviour.

Best regards,

Wolfgang Denk

-- ------------------------------------------------------------------------- Dipl.-Ing. Matthias Fuchs Head of System Design esd electronic system design gmbh Vahrenwalder Str. 207 - 30165 Hannover - GERMANY Phone: +49-511-37298-0 - Fax: +49-511-37298-68 Please visit our homepage http://www.esd.eu Quality Products - Made in Germany ------------------------------------------------------------------------- Geschäftsführer: Klaus Detering, Dr. Werner Schulze Amtsgericht Hannover HRB 51373 - VAT-ID DE 115672832 -------------------------------------------------------------------------

Wolfgang Denk

22 Apr 22 Apr

10:49 p.m.

In message 200804212302.30762.matthias.fuchs@esd-electronics.com you wrote:

...

Now I have to find a (simple) solution to solve my problem:

Typically the 405 board boots from onboard flash. Because of historic reason there is a kernel and a ramdisk image (not a multi image and nothing that is aware of any new image format). These images cannot be changed. When one of these images either one of them or both is corrupted, U-Boot should try to load both of them from a usb mass storage. So what's the best way to do so?

The key question here is your definition of "corrupted".

If reliability is an issue, you want to implement (1) support for a hardware watchdog combined with (2) support for a boot counter. Then you set "bootlimit" to a reasonable value and "altbootcmd" to the command required to load and boot from USB.

Such a setup will be very robust and handle even situations when the images look good (checksums are OK etc.) but fail to work (for example, because of buggy binaries or libraries were included, config files got corrupted, etc.).

...

Make bootm fail when any image has a CRC error?

This is trivial to do. Remember that you can always use "imi" to check images; something like

=> imi $kernel_addr && imi $ramdisk_addr && bootm $kernel_addr $ramdisk_addr

would do what you want.

The new image format allows for even fancier methods.

Or implement a boot counter and let the board reset on corrupt images and then use "altbootcmd".

...

Add a new command to check images and decide on the result

Not needed. "iminfo" already does that.

...

Any idea? I think the idea behind this is clear. When images A are not ok boot images B.

As mentioned above, the trick question is how you define when an image is OK.

Best regards,

Wolfgang Denk

-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de What kind of love is that? Not to be loved; never to have shown love. -- Commissioner Nancy Hedford, "Metamorphosis", stardate 3219.8

Matthias Fuchs

23 Apr 23 Apr

10:43 a.m.

Hi Wolfgang,

thanks for your reply. That's the kind of thing I wanted to hear. Now I will start playing around ;-)

Matthias

On Tuesday 22 April 2008 22:49, Wolfgang Denk wrote:

...

In message 200804212302.30762.matthias.fuchs@esd-electronics.com you wrote:

...
Now I have to find a (simple) solution to solve my problem:

Typically the 405 board boots from onboard flash. Because of historic reason there is a kernel and a ramdisk image (not a multi image and nothing that is aware of any new image format). These images cannot be changed. When one of these images either one of them or both is corrupted, U-Boot should try to load both of them from a usb mass storage. So what's the best way to do so?

The key question here is your definition of "corrupted".

If reliability is an issue, you want to implement (1) support for a hardware watchdog combined with (2) support for a boot counter. Then you set "bootlimit" to a reasonable value and "altbootcmd" to the command required to load and boot from USB.

Such a setup will be very robust and handle even situations when the images look good (checksums are OK etc.) but fail to work (for example, because of buggy binaries or libraries were included, config files got corrupted, etc.).

...

Make bootm fail when any image has a CRC error?

This is trivial to do. Remember that you can always use "imi" to check images; something like

=> imi $kernel_addr && imi $ramdisk_addr && bootm $kernel_addr $ramdisk_addr

would do what you want.

The new image format allows for even fancier methods.

Or implement a boot counter and let the board reset on corrupt images and then use "altbootcmd".

...

Add a new command to check images and decide on the result

Not needed. "iminfo" already does that.

...
Any idea? I think the idea behind this is clear. When images A are not ok boot images B.

As mentioned above, the trick question is how you define when an image is OK.

Best regards,

Wolfgang Denk

6223

Age (days ago)

6225

Last active (days ago)

List overview

Download

8 comments

3 participants

tags (0)

participants (3)

Jerry Van Baren
Matthias Fuchs
Wolfgang Denk