[U-Boot-Users] [PATCH] 8xx: fix initialation bug/race condition between u-boot & linux

Hi all
Sorry in advance for cross-posting but this is a problem that affects both the boot loader and the kernel.
In my 8xx board we use u-boot in order to boot linux.
My troubles began when for dpram conflict reasons we moved from using a console on an SMC to a max3100 SPI/UART converter chip.
I was still using the SMC console in u-boot, but when tried to use this chip as a driver for the boot console and removing the CPM console, strange panics appeared in the ethernet driver.
After some interesting debugging sessions the problem was discovered to be this:
The u-boot loader upon finishing the decompression of the kernel image it just jumps into the kernel without resetting the CP. The kernel however resets the CP only when configured to apply a u-patch. The UART in the CP is still active however, and the descriptors in dpram end up overlapping.
The attached patch fixes the common problem by performing the CP reset every time.
This is only a bandaid however, because under network load at the time of booting, i.e. broadcast packets or rebooting after a crash and another end sending non TCP packets, we still crash. It is still possible for a packet to be received and placed in memory by DMA before the kernel reaching the point where the ethernet controller is initialized.
[boot jumps into kernel ] [kernel starts executing ] <--- incoming packet write in memory XXX possible corruption XXX
[init ethernet controller]
The proper way to address the problem is for the loader to stop any DMA capable peripherals before executing the kernel.
The fix for my case is a simple reset of the CP, but I believe that this is a problem that should be looked upon by the maintainers.
IMHO it is good practice for the kernel too to issue a reset to it's peripherals as soon as possible in order to minimize the chances of that happening.
Regards
Pantelis
--- commproc-orig.c Tue Jun 17 15:12:27 2003 +++ commproc.c Tue Jun 17 15:13:31 2003 @@ -115,7 +115,6 @@ imp = (immap_t *)IMAP_ADDR; commproc = (cpm8xx_t *)&imp->im_cpm;
-#ifdef CONFIG_UCODE_PATCH /* Perform a reset. */ commproc->cp_cpcr = (CPM_CR_RST | CPM_CR_FLG); @@ -124,6 +123,7 @@ */ while (commproc->cp_cpcr & CPM_CR_FLG);
+#ifdef CONFIG_UCODE_PATCH cpm_load_patch(imp); #endif

On Tue, Jun 17, 2003 at 03:43:34PM +0300, Pantelis Antoniou wrote:
The u-boot loader upon finishing the decompression of the kernel image it just jumps into the kernel without resetting the CP. The kernel however resets the CP only when configured to apply a u-patch. The UART in the CP is still active however, and the descriptors in dpram end up overlapping.
I believe that the assumption here, which other firmwares follow, is that this would be taken care of already. Dan, do you recall making any assumptions about this either way?
[snip]
This is only a bandaid however, because under network load at the time of booting, i.e. broadcast packets or rebooting after a crash and another end sending non TCP packets, we still crash.
[snip]
The proper way to address the problem is for the loader to stop any DMA capable peripherals before executing the kernel.
Does doing this fix, fix all of your problems?

Tom Rini wrote:
On Tue, Jun 17, 2003 at 03:43:34PM +0300, Pantelis Antoniou wrote:
The u-boot loader upon finishing the decompression of the kernel image it just jumps into the kernel without resetting the CP. The kernel however resets the CP only when configured to apply a u-patch. The UART in the CP is still active however, and the descriptors in dpram end up overlapping.
I believe that the assumption here, which other firmwares follow, is that this would be taken care of already. Dan, do you recall making any assumptions about this either way?
[snip]
This is only a bandaid however, because under network load at the time of booting, i.e. broadcast packets or rebooting after a crash and another end sending non TCP packets, we still crash.
[snip]
The proper way to address the problem is for the loader to stop any DMA capable peripherals before executing the kernel.
Does doing this fix, fix all of your problems?
Yes, this fixed my problems. But this is mostly luck.
It just happens that the reception buffer for the ethernet packet is high enough in memory that no memory corruption takes place. I'm not sure however if we're completely safe in all cases.
I think it's just proper defensive programming to always do a CP reset in that case, and as soon as possible. The best thing we could do is do the reset at head_8xx.S.
Regards
Pantelis

In message 3EF00E16.3050604@intracom.gr you wrote:
I think it's just proper defensive programming to always do a CP reset in that case, and as soon as possible.
Then probably the boot loader should do it before starting the Linux kernel.
The best thing we could do is do the reset at head_8xx.S.
Note that such a change will kill all attempts for ealy debug output over the serial console port.
Wolfgang Denk

Wolfgang Denk wrote:
In message 3EF00E16.3050604@intracom.gr you wrote:
I think it's just proper defensive programming to always do a CP reset in that case, and as soon as possible.
Then probably the boot loader should do it before starting the Linux kernel.
Yes, this is the best thing.
I propose we add two new functions to u-boot.
1. cpu_peripheral_reset() Do a reset for the on-chip peripherals of the cpu. For the 8xx family a CP reset is sufficient.
2. board_peripheral_reset() A conditional function controlled by CONFIG_HAVE_BOARD_PERIPHERAL_RESET. Do the reset for board specific peripherals.
And have them called before jumping into the kernel.
But this is a far reaching change for u-boot since at least the cpu_peripheral_reset() must be implemented for all the cpus that u-boot supports. And I'm not talking about the boards.
The decision is yours Wolfgang.
The best thing we could do is do the reset at head_8xx.S.
Note that such a change will kill all attempts for ealy debug output over the serial console port.
That is truly unfortunate. Perhaps we can avoid the full CP reset and do only the DMA peripheral reset in this case. The buffers for the uart are in dpram anyway, so there no chance for a crash.
Wolfgang Denk
Regards
Pantelis

Hi Wolfgang,
Wolfgang Denk wrote:
In message 3EF00E16.3050604@intracom.gr you wrote:
I think it's just proper defensive programming to always do a CP reset in that case, and as soon as possible.
Then probably the boot loader should do it before starting the Linux kernel.
This would be a mostly hidden requirement for any boot loader towards the ppc linux kernel.
The best thing we could do is do the reset at head_8xx.S.
Note that such a change will kill all attempts for ealy debug output over the serial console port.
That would also be the case when the boot loader does it. The kernel will initialize the uart again very early isn't it? And is a CP reset enough? I think I remember you also have to unlock the pram section that is used for uCode.
just some thoughts,
Jaap-Jan
Wolfgang Denk
-- Software Engineering: Embedded and Realtime Systems, Embedded Linux Phone: (+49)-8142-4596-87 Fax: (+49)-8142-4596-88 Email: wd@denx.de The one who says it cannot be done should never interrupt the one who is doing it.
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

On Wed, Jun 18, 2003 at 10:32:29AM +0200, Jaap-Jan Boor wrote:
Hi Wolfgang,
Wolfgang Denk wrote:
In message 3EF00E16.3050604@intracom.gr you wrote:
I think it's just proper defensive programming to always do a CP reset in that case, and as soon as possible.
Then probably the boot loader should do it before starting the Linux kernel.
This would be a mostly hidden requirement for any boot loader towards the ppc linux kernel.
... or this is already the case (which is why I ping'ed Dan who would probably recall such a thing :)). If so, it would probably be a good idea to make a note in head_8xx.S about that assumption.

Pantelis Antoniou wrote:
The u-boot loader upon finishing the decompression of the kernel image it just jumps into the kernel without resetting the CP. The kernel however resets the CP only when configured to apply a u-patch. The UART in the CP is still active however, and the descriptors in dpram end up overlapping.
The UART console has a number of subtle assumptions built around the initialization to support kgdb/xmon debugging. The quick answer is we don't want to reset the CPM so we can proper support this debugging. This has all been described in archives before...........
When the kernel is first booted, kgdb/xmon use the CPM as it was set up by the boot rom. There is a second initialization of the UART driver, but before the console is initialized. This changes the BDs, but the UART still operates for kgdb/xmon. The final stage initialization occurs when the console is initialized, and all of the "normal path" debugging and messages can occur after this point.
The attached patch fixes the common problem by performing the CP reset every time.
We don't want to do this and shouldn't need to do so except in the case of loading microcode patches.
This is only a bandaid however, because under network load at the time of booting, i.e. broadcast packets or rebooting after a crash and another end sending non TCP packets, we still crash.
This is a common problem of all bootloaders that don't disable the Ethernet controller after they have performed the network load operation. You must disable any kind of DMA operation that could write into memory while any software outside of the boot rom is initializing. Performing the CPM reset in the kernel doesn't solve this problem, it only makes the window of opportunity smaller. With sufficient network traffic and coincidental buffer locations, you will always crash even with the CPM reset in the kernel.
Thanks.
-- Dan

Dan Malek wrote:
Pantelis Antoniou wrote:
The u-boot loader upon finishing the decompression of the kernel image it just jumps into the kernel without resetting the CP. The kernel however resets the CP only when configured to apply a u-patch. The UART in the CP is still active however, and the descriptors in dpram end up overlapping.
The UART console has a number of subtle assumptions built around the initialization to support kgdb/xmon debugging. The quick answer is we don't want to reset the CPM so we can proper support this debugging. This has all been described in archives before...........
When the kernel is first booted, kgdb/xmon use the CPM as it was set up by the boot rom. There is a second initialization of the UART driver, but before the console is initialized. This changes the BDs, but the UART still operates for kgdb/xmon. The final stage initialization occurs when the console is initialized, and all of the "normal path" debugging and messages can occur after this point.
OK. Shouldn't there be at least a comment in the code to mention that?
The attached patch fixes the common problem by performing the CP reset every time.
We don't want to do this and shouldn't need to do so except in the case of loading microcode patches.
This is only a bandaid however, because under network load at the time of booting, i.e. broadcast packets or rebooting after a crash and another end sending non TCP packets, we still crash.
This is a common problem of all bootloaders that don't disable the Ethernet controller after they have performed the network load operation. You must disable any kind of DMA operation that could write into memory while any software outside of the boot rom is initializing. Performing the CPM reset in the kernel doesn't solve this problem, it only makes the window of opportunity smaller. With sufficient network traffic and coincidental buffer locations, you will always crash even with the CPM reset in the kernel.
Yep, it is a job for the bootloader. Wolfgang?
Thanks.
-- Dan

In message 3EF156E9.9000600@intracom.gr you wrote:
OK. Shouldn't there be at least a comment in the code to mention that?
Feel free to submit a patch.
This is a common problem of all bootloaders that don't disable the Ethernet controller after they have performed the network load operation. You must disable any kind of DMA operation that could write into memory while any software outside of the boot rom is initializing. Performing the CPM reset in the kernel doesn't solve this problem, it only makes the window of opportunity smaller. With sufficient network traffic and coincidental buffer locations, you will always crash even with the CPM reset in the kernel.
Yep, it is a job for the bootloader. Wolfgang?
Can you please explain again under which conditions the problem happened? As Dan explained there is good reason NOT to perform a global CPM reset, but to leave the console UART enabled (which should be not critical). Regarding network transfers I think that U-Boot will call eth_halt() after all (attempted) network transfers; eth_halt() in turn calls driver functions (like scc_halt() or fec_halt()) that should disable the network controller.
You wrote:
My troubles began when for dpram conflict reasons we moved from using a console on an SMC to a max3100 SPI/UART converter chip.
I have never seen a crash like the one you reported, and in fact I am not convinced that it is a global problem. Maybe the problem is only introduced by your own SPI/UART driver code?
In any case: if you find a place where a CPM device is not shut down before booting Linux please submit a patch. But I would like to avoid a global CPM reset.
Best regards,
Wolfgang Denk

Wolfgang Denk wrote:
In message 3EF156E9.9000600@intracom.gr you wrote:
OK. Shouldn't there be at least a comment in the code to mention that?
Feel free to submit a patch.
OK
This is a common problem of all bootloaders that don't disable the Ethernet controller after they have performed the network load operation. You must disable any kind of DMA operation that could write into memory while any software outside of the boot rom is initializing. Performing the CPM reset in the kernel doesn't solve this problem, it only makes the window of opportunity smaller. With sufficient network traffic and coincidental buffer locations, you will always crash even with the CPM reset in the kernel.
Yep, it is a job for the bootloader. Wolfgang?
Can you please explain again under which conditions the problem happened? As Dan explained there is good reason NOT to perform a global CPM reset, but to leave the console UART enabled (which should be not critical). Regarding network transfers I think that U-Boot will call eth_halt() after all (attempted) network transfers; eth_halt() in turn calls driver functions (like scc_halt() or fec_halt()) that should disable the network controller.
You wrote:
My troubles began when for dpram conflict reasons we moved from using a console on an SMC to a max3100 SPI/UART converter chip.
I have never seen a crash like the one you reported, and in fact I am not convinced that it is a global problem. Maybe the problem is only introduced by your own SPI/UART driver code?
In any case: if you find a place where a CPM device is not shut down before booting Linux please submit a patch. But I would like to avoid a global CPM reset.
The problem was introduced while testing the SPI/UART driver and using the SMC uart on u-boot. Since the SMC uart was disabled in linux it's dpram memory area was used by the next CP driver, the ethernet. However it was still active and both the ethernet and SMC ended up using the same dpram.
When both u-boot and the kernel use the SPI/UART for console there is no problem.
Granted it is an obscure case, but the whole thing was surprising.
And since the whole mess was produced by kgdb, which was not configured in could we at least reset the CP if we don't use kgdb? It will save some poor sucker some hair.
Best regards,
Wolfgang Denk
Regards
Pantelis

In message 3EF180AB.3030304@intracom.gr you wrote:
The problem was introduced while testing the SPI/UART driver and using the SMC uart on u-boot. Since the SMC uart was disabled in linux it's dpram memory area was used by the next CP driver, the ethernet. However it was still active and both the ethernet and SMC ended up using the same dpram.
It is indeed an unusual configuration to use different devices for console in the boot loader and in Linux.
When both u-boot and the kernel use the SPI/UART for console there is no problem.
Granted it is an obscure case, but the whole thing was surprising.
If you do such a thing you are supposed to know what you are doing. I don't think it is worth the effort to protect against all possible things.
And since the whole mess was produced by kgdb, which was not configured in could we at least reset the CP if we don't use kgdb? It will save some poor sucker some hair.
I disagree. As Dan Malek has explained, there is good reason NOT to perform a CPM reset. Instead, the console port _shall_ be passed initialized and usable to Linux. I think it is not wise to assume that only kgdb may use it.
A CPM reset is a too drastic measure. Neither U-Boot nor Linux are designed this way. Instead, normally a driver will enable just a single port/channel at init, and disable it again in the release/close function - see for example the ethernet drivers. The fact that this is not done for the console is intentional. If you use a SMC UART port in U-Boot and do not pass this as console up to Linux, it is IMHO your responsibility to disable exactly this channel only.
But this is an exotic setup, and IMHO not worth a patch.
Best regards,
Wolfgang Denk

On Thu, Jun 19, 2003 at 09:23:37AM +0300, Pantelis Antoniou wrote:
Dan Malek wrote:
Pantelis Antoniou wrote:
The u-boot loader upon finishing the decompression of the kernel image it just jumps into the kernel without resetting the CP. The kernel however resets the CP only when configured to apply a u-patch. The UART in the CP is still active however, and the descriptors in dpram end up overlapping.
The UART console has a number of subtle assumptions built around the initialization to support kgdb/xmon debugging. The quick answer is we don't want to reset the CPM so we can proper support this debugging. This has all been described in archives before...........
When the kernel is first booted, kgdb/xmon use the CPM as it was set up by the boot rom. There is a second initialization of the UART driver, but before the console is initialized. This changes the BDs, but the UART still operates for kgdb/xmon. The final stage initialization occurs when the console is initialized, and all of the "normal path" debugging and messages can occur after this point.
OK. Shouldn't there be at least a comment in the code to mention that?
Yes, there should. Throwing that into head_8xx.S now..
participants (5)
-
Dan Malek
-
Jaap-Jan Boor
-
Pantelis Antoniou
-
Tom Rini
-
Wolfgang Denk