[U-Boot] bootm does not work if netconsole is enabled

Hi Joe, Hi Tom,
If i have the netconsole enabled i cannot boot linux using the bootm command. This bug exists at least in 2013.01.01 and 2013.04-rc2 :/
Here is the output of the serial console of a successful startup:
## Booting kernel from Legacy Image at 00100000 ... Image Name: Linux-3.8.0-rc1-00004-g270c0a0-d Image Type: ARM Linux Kernel Image (uncompressed) Data Size: 2799632 Bytes = 2.7 MiB Load Address: 00008000 Entry Point: 00008000 Verifying Checksum ... OK ## Loading init Ramdisk from Legacy Image at 00800000 ... Image Name: Image Type: ARM Linux RAMDisk Image (uncompressed) Data Size: 636966 Bytes = 622 KiB Load Address: 00000000 Entry Point: 00000000 Verifying Checksum ... OK ## Flattened Device Tree blob at 00700000 Booting using the fdt blob at 0x00700000 Loading Kernel Image ... OK OK Loading Ramdisk to 03aad000, end 03b48826 ... OK Loading Device Tree to 03aa8000, end 03aacedd ... OK
Starting kernel ...
[.. more linux kernel output ..]
If the netconsole is enabled, eg. "stdin = stdout = stderr = nc", i see only the following output on the netconsole and then nothing more happens:
## Booting kernel from Legacy Image at 00100000 ... Image Name: Linux-3.8.0-rc1-00004-g270c0a0-d Image Type: ARM Linux Kernel Image (uncompressed) Data Size: 2799632 Bytes = 2.7 MiB Load Address: 00008000 Entry Point: 00008000 Verifying Checksum ... OK ## Loading init Ramdisk from Legacy Image at 00800000 ... Image Name: Image Type: ARM Linux RAMDisk Image (uncompressed) Data Size: 636966 Bytes = 622 KiB Load Address: 00000000 Entry Point: 00000000 Verifying Checksum ... OK ## Flattened Device Tree blob at 00700000 Booting using the fdt blob at 0x00700000
I've tracked this down to the eth_halt() call in cmd_bootm.c:647. What is the purpose of this call? I guess the NIC should be shut down prior to booting linux. OTOH, there are still messages printed after this call, and i suppose the netconsole tries to bring the network back up. If i remove this call, everything works as expected. Anyway, i can definitely say, that the linux kernel is not starting. Eg. it is not only the output that stops working.
FYI, i'm using the mvgbe driver.
Let me know if i can do some more debugging or provide more information. Hopefully this will be fixed in 2013.04 :)

Hi Michael,
I just tested this on my Zynq target and it worked. However, you make a good point that it is possible for there to be more traces after the eth_halt call. I can't imagine the stack would like that in all situations (since eth_init() will not be called again by netconsole). I think the solution to this is probably to disable netconsole somehow before the eth_halt command, so subsequent prints do not try to use Ethernet again. However, most of the things you might switch to are conditional, such as a nulldev or silent. Any thoughts on how you would like to see it solved?
Cheers, -Joe
On Tue, Apr 9, 2013 at 5:21 PM, Michael Walle michael@walle.cc wrote:
Hi Joe, Hi Tom,
If i have the netconsole enabled i cannot boot linux using the bootm command. This bug exists at least in 2013.01.01 and 2013.04-rc2 :/
Here is the output of the serial console of a successful startup:
## Booting kernel from Legacy Image at 00100000 ... Image Name: Linux-3.8.0-rc1-00004-g270c0a0-d Image Type: ARM Linux Kernel Image (uncompressed) Data Size: 2799632 Bytes = 2.7 MiB Load Address: 00008000 Entry Point: 00008000 Verifying Checksum ... OK ## Loading init Ramdisk from Legacy Image at 00800000 ... Image Name: Image Type: ARM Linux RAMDisk Image (uncompressed) Data Size: 636966 Bytes = 622 KiB Load Address: 00000000 Entry Point: 00000000 Verifying Checksum ... OK ## Flattened Device Tree blob at 00700000 Booting using the fdt blob at 0x00700000 Loading Kernel Image ... OK OK Loading Ramdisk to 03aad000, end 03b48826 ... OK Loading Device Tree to 03aa8000, end 03aacedd ... OK
Starting kernel ...
[.. more linux kernel output ..]
If the netconsole is enabled, eg. "stdin = stdout = stderr = nc", i see only the following output on the netconsole and then nothing more happens:
## Booting kernel from Legacy Image at 00100000 ... Image Name: Linux-3.8.0-rc1-00004-g270c0a0-d Image Type: ARM Linux Kernel Image (uncompressed) Data Size: 2799632 Bytes = 2.7 MiB Load Address: 00008000 Entry Point: 00008000 Verifying Checksum ... OK ## Loading init Ramdisk from Legacy Image at 00800000 ... Image Name: Image Type: ARM Linux RAMDisk Image (uncompressed) Data Size: 636966 Bytes = 622 KiB Load Address: 00000000 Entry Point: 00000000 Verifying Checksum ... OK ## Flattened Device Tree blob at 00700000 Booting using the fdt blob at 0x00700000
I've tracked this down to the eth_halt() call in cmd_bootm.c:647. What is the purpose of this call? I guess the NIC should be shut down prior to booting linux. OTOH, there are still messages printed after this call, and i suppose the netconsole tries to bring the network back up. If i remove this call, everything works as expected. Anyway, i can definitely say, that the linux kernel is not starting. Eg. it is not only the output that stops working.
FYI, i'm using the mvgbe driver.
Let me know if i can do some more debugging or provide more information. Hopefully this will be fixed in 2013.04 :)
-- Michael

On Wed, April 10, 2013 03:51, Joe Hershberger wrote:
Hi Michael,
I just tested this on my Zynq target and it worked. However, you make a good point that it is possible for there to be more traces after the eth_halt call. I can't imagine the stack would like that in all situations (since eth_init() will not be called again by netconsole). I think the solution to this is probably to disable netconsole somehow before the eth_halt command, so subsequent prints do not try to use Ethernet again. However, most of the things you might switch to are conditional, such as a nulldev or silent. Any thoughts on how you would like to see it solved?
Hi Joe,
disabling the network console before control is handed over to an operating system sounds reasonable.
If i understand you correctly, the network will only be halted once the bootloader starts an operating system. Then what do you think about making either the nc_send_packet() or nc_putc()/nc_puts() function a noop when the network is halted?

Hi Michael,
On Wed, Apr 10, 2013 at 5:07 AM, Michael Walle michael@walle.cc wrote:
On Wed, April 10, 2013 03:51, Joe Hershberger wrote:
Hi Michael,
I just tested this on my Zynq target and it worked. However, you make a good point that it is possible for there to be more traces after the eth_halt call. I can't imagine the stack would like that in all situations (since eth_init() will not be called again by netconsole). I think the solution to this is probably to disable netconsole somehow before the eth_halt command, so subsequent prints do not try to use Ethernet again. However, most of the things you might switch to are conditional, such as a nulldev or silent. Any thoughts on how you would like to see it solved?
Hi Joe,
disabling the network console before control is handed over to an operating system sounds reasonable.
If i understand you correctly, the network will only be halted once the bootloader starts an operating system. Then what do you think about making either the nc_send_packet() or nc_putc()/nc_puts() function a noop when the network is halted?
It's not quite that simple since the state does not really reflect the hardware (I know... that's my fault). I can try to fix this for next release, but it is probably too risky to put into the April release. In the mean time a simple work-around is to use silent or nulldev on your target and turn off netconsole as part of your boot script (before you bootm).
Cheers, -Joe

Hi Joe,
Am Mittwoch 10 April 2013, 18:13:57 schrieb Joe Hershberger:
If i understand you correctly, the network will only be halted once the bootloader starts an operating system. Then what do you think about making either the nc_send_packet() or nc_putc()/nc_puts() function a noop when the network is halted?
It's not quite that simple since the state does not really reflect the hardware (I know... that's my fault). I can try to fix this for next release, but it is probably too risky to put into the April release.
Agreed. Let me know if i can test something for you.
In the mean time a simple work-around is to use silent or nulldev on your target and turn off netconsole as part of your boot script (before you bootm).
I worked around this bug by setting the stdout back to serial just before the bootm command.

Am 2013-04-10 18:13, schrieb Joe Hershberger:
Hi Michael,
On Wed, Apr 10, 2013 at 5:07 AM, Michael Walle michael@walle.cc wrote:
On Wed, April 10, 2013 03:51, Joe Hershberger wrote:
Hi Michael,
I just tested this on my Zynq target and it worked. However, you make a good point that it is possible for there to be more traces after the eth_halt call. I can't imagine the stack would like that in all situations (since eth_init() will not be called again by netconsole). I think the solution to this is probably to disable netconsole somehow before the eth_halt command, so subsequent prints do not try to use Ethernet again. However, most of the things you might switch to are conditional, such as a nulldev or silent. Any thoughts on how you would like to see it solved?
Hi Joe,
disabling the network console before control is handed over to an operating system sounds reasonable.
If i understand you correctly, the network will only be halted once the bootloader starts an operating system. Then what do you think about making either the nc_send_packet() or nc_putc()/nc_puts() function a noop when the network is halted?
It's not quite that simple since the state does not really reflect the hardware (I know... that's my fault). I can try to fix this for next release, but it is probably too risky to put into the April release.
Ping :)

Am Mittwoch, 10. April 2013, 18:13:57 schrieb Joe Hershberger:
disabling the network console before control is handed over to an operating system sounds reasonable.
If i understand you correctly, the network will only be halted once the bootloader starts an operating system. Then what do you think about making either the nc_send_packet() or nc_putc()/nc_puts() function a noop when the network is halted?
It's not quite that simple since the state does not really reflect the hardware (I know... that's my fault). I can try to fix this for next release, but it is probably too risky to put into the April release.
ping #2 :)

When netconsole is active, some boards fail to boot. This patch enables only the serial console before control is handed by another operating system.
Signed-off-by: Frédéric Leroy fredo@starox.org ---
Hello,
I am facing the same problem with LaCie kirkwood boards. I took a simple approach for fixing this issue. This works for me ... Any comments are welcome :)
Frédéric
common/cmd_bootm.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/common/cmd_bootm.c b/common/cmd_bootm.c index 15f4599..81e8322 100644 --- a/common/cmd_bootm.c +++ b/common/cmd_bootm.c @@ -62,6 +62,17 @@ #include <linux/lzo.h> #endif /* CONFIG_LZO */
+ +#if defined(CONFIG_NETCONSOLE) +#include <iomux.h> +void console_set_serial_unconditionally(void) +{ + iomux_doenv(stdin, "serial"); + iomux_doenv(stdout, "serial"); + iomux_doenv(stderr, "serial"); +} +#endif + DECLARE_GLOBAL_DATA_PTR;
#ifndef CONFIG_SYS_BOOTM_LEN @@ -577,6 +588,7 @@ static int do_bootm_subcommand(cmd_tbl_t *cmdtp, int flag, int argc, * Stop the ethernet stack if NetConsole could have * left it up */ + console_set_serial_unconditionally("nc"); eth_halt(); #endif arch_preboot_os(); @@ -645,6 +657,7 @@ int do_bootm(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[])
#ifdef CONFIG_NETCONSOLE /* Stop the ethernet stack if NetConsole could have left it up */ + console_set_serial_unconditionally("nc"); eth_halt(); #endif
@@ -1849,6 +1862,7 @@ static int do_bootz(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[])
#ifdef CONFIG_NETCONSOLE /* Stop the ethernet stack if NetConsole could have left it up */ + console_set_serial_unconditionally("nc"); eth_halt(); #endif

Dear Frédéric Leroy,
In message 1373192736-5014-1-git-send-email-fredo@starox.org you wrote:
When netconsole is active, some boards fail to boot. This patch enables only the serial console before control is handed by another operating system.
I really hate adding such fixes without understanding the exact nature of the problem.
Could you please be so kind and explain in which way booting fails when netconsole is active? On what exactly the "some boards" are that fail that way? And why this does not appear to be a problem on other boards? (or is it?) And thy the "some boards" cannot be fixed in such a way to behave as the rest of the boards (where it's not a problem) ?

On Sun, Jul 07, 2013 at 12:25:36PM +0200, Fr??d??ric Leroy wrote:
When netconsole is active, some boards fail to boot. This patch enables only the serial console before control is handed by another operating system.
Signed-off-by: Fr??d??ric Leroy fredo@starox.org
Hello,
I am facing the same problem with LaCie kirkwood boards. I took a simple approach for fixing this issue. This works for me ... Any comments are welcome :)
Can you please re-base this to mainline and re-post? Also, can you expand on the comments in the commit message to explain the problem with netconsole and thus why this fixes it?

Netconsole calls eth_halt() before giving control to another operating system. But the state machine of netconsole don't take it into account. Thus, netconsole calls network functions of an halted network device, making the whole system freeze. Rather than modifying the state machine of netconsole, we just unregister the current network device before booting. It does work because nc_send_packet() verifies that the current network device is not null.
Signed-off-by: Frédéric Leroy fredo@starox.org --- common/cmd_bootm.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
Hello Tom,
I have made a cleaner version of my patch against master. I didn't touch the netconsole code because I don't fully understand it.
Regards,
diff --git a/common/cmd_bootm.c b/common/cmd_bootm.c index 15f4599..beb25b6 100644 --- a/common/cmd_bootm.c +++ b/common/cmd_bootm.c @@ -577,7 +577,7 @@ static int do_bootm_subcommand(cmd_tbl_t *cmdtp, int flag, int argc, * Stop the ethernet stack if NetConsole could have * left it up */ - eth_halt(); + eth_unregister(eth_get_dev()); #endif arch_preboot_os(); boot_fn(BOOTM_STATE_OS_GO, argc, argv, &images); @@ -645,7 +645,7 @@ int do_bootm(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[])
#ifdef CONFIG_NETCONSOLE /* Stop the ethernet stack if NetConsole could have left it up */ - eth_halt(); + eth_unregister(eth_get_dev()); #endif
#if defined(CONFIG_CMD_USB) @@ -1849,7 +1849,7 @@ static int do_bootz(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[])
#ifdef CONFIG_NETCONSOLE /* Stop the ethernet stack if NetConsole could have left it up */ - eth_halt(); + eth_unregister(eth_get_dev()); #endif
#if defined(CONFIG_CMD_USB)

Netconsole calls eth_halt() before giving control to another operating system. But the state machine of netconsole don't take it into account. Thus, netconsole calls network functions of an halted network device, making the whole system freeze. Rather than modifying the state machine of netconsole, we just unregister the current network device before booting. It does work because nc_send_packet() verifies that the current network device is not null.
Signed-off-by: Frédéric Leroy fredo@starox.org ---
Sorry, I was dumb. I don't know why I removed call to eth_halt(). eth_unregister() don't halt the device.
Changes in v2: don't remove the call to eth_halt() common/cmd_bootm.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/common/cmd_bootm.c b/common/cmd_bootm.c index 15f4599..549a914 100644 --- a/common/cmd_bootm.c +++ b/common/cmd_bootm.c @@ -578,6 +578,7 @@ static int do_bootm_subcommand(cmd_tbl_t *cmdtp, int flag, int argc, * left it up */ eth_halt(); + eth_unregister(eth_get_dev()); #endif arch_preboot_os(); boot_fn(BOOTM_STATE_OS_GO, argc, argv, &images); @@ -646,6 +647,7 @@ int do_bootm(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[]) #ifdef CONFIG_NETCONSOLE /* Stop the ethernet stack if NetConsole could have left it up */ eth_halt(); + eth_unregister(eth_get_dev()); #endif
#if defined(CONFIG_CMD_USB) @@ -1850,6 +1852,7 @@ static int do_bootz(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[]) #ifdef CONFIG_NETCONSOLE /* Stop the ethernet stack if NetConsole could have left it up */ eth_halt(); + eth_unregister(eth_get_dev()); #endif
#if defined(CONFIG_CMD_USB)

On Tue, Sep 10, 2013 at 12:02:31PM +0200, Frederic Leroy wrote:
Netconsole calls eth_halt() before giving control to another operating system. But the state machine of netconsole don't take it into account. Thus, netconsole calls network functions of an halted network device, making the whole system freeze. Rather than modifying the state machine of netconsole, we just unregister the current network device before booting. It does work because nc_send_packet() verifies that the current network device is not null.
Signed-off-by: Fr??d??ric Leroy fredo@starox.org
Applied to u-boot/master, thanks!

When netconsole is active, some boards fail to boot. This patch enables only the serial console before control is handed over to another operating system.
Signed-off-by: Frédéric Leroy fredo@starox.org ---
Sorry for the noise, I tend to post faster than my own swadow ...
Changes in v2 :
- remove unused argument from console_set_serial_unconditionally call - fix platforms where CONFIG_CONSOLE_MUX is not defined
common/cmd_bootm.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
diff --git a/common/cmd_bootm.c b/common/cmd_bootm.c index 15f4599..e75c692 100644 --- a/common/cmd_bootm.c +++ b/common/cmd_bootm.c @@ -62,6 +62,18 @@ #include <linux/lzo.h> #endif /* CONFIG_LZO */
+#if defined(CONFIG_NETCONSOLE) +#include <iomux.h> +void console_set_serial_unconditionally(void) +{ +#if defined(CONFIG_CONSOLE_MUX) + iomux_doenv(stdin, "serial"); + iomux_doenv(stdout, "serial"); + iomux_doenv(stderr, "serial"); +#endif +} +#endif + DECLARE_GLOBAL_DATA_PTR;
#ifndef CONFIG_SYS_BOOTM_LEN @@ -577,6 +589,7 @@ static int do_bootm_subcommand(cmd_tbl_t *cmdtp, int flag, int argc, * Stop the ethernet stack if NetConsole could have * left it up */ + console_set_serial_unconditionally(); eth_halt(); #endif arch_preboot_os(); @@ -645,6 +658,7 @@ int do_bootm(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[])
#ifdef CONFIG_NETCONSOLE /* Stop the ethernet stack if NetConsole could have left it up */ + console_set_serial_unconditionally(); eth_halt(); #endif
@@ -1849,6 +1863,7 @@ static int do_bootz(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[])
#ifdef CONFIG_NETCONSOLE /* Stop the ethernet stack if NetConsole could have left it up */ + console_set_serial_unconditionally(); eth_halt(); #endif
participants (5)
-
Frédéric Leroy
-
Joe Hershberger
-
Michael Walle
-
Tom Rini
-
Wolfgang Denk