[PATCH] efi: Call bootm_disable_interrupts earlier in efi_exit_boot_services

If we look at the path that bootm/booti take when preparing to boot the OS, we see that as part of (or prior to calling do_bootm_states, explicitly) the process, bootm_disable_interrupts() is called prior to announce_and_cleanup() which is where udc_disconnect() / board_quiesce_devices() / dm_remove_devices_flags() are called from. In the EFI path, these are called afterwards. In efi_exit_boot_services() however we have been calling bootm_disable_interrupts() after the above functions, as part of ensuring that we disable interrupts as required by the spec. However, bootm_disable_interrupts() is also where we go and call usb_stop(). While this has been fine before, on the TI J721E platform this leads us to an exception. This exception seems likely to be the case that we're trying to stop devices that we have already disabled clocks for. The most direct way to handle this particular problem is to make EFI behave like the do_bootm_states() process and ensure we call bootm_disable_interrupts() prior to ending up in usb_stop().
Cc: Ilias Apalodimas ilias.apalodimas@linaro.org Cc: Heinrich Schuchardt xypron.glpk@gmx.de Cc: Simon Glass sjg@chromium.org Suggested-by: Ilias Apalodimas ilias.apalodimas@linaro.org Signed-off-by: Tom Rini trini@konsulko.com --- First up, as my wording above implies, I'm assuming rather than being 100% confident in why calling usb_stop() is leading to the exception I get. It's this call: /* Locate root hub device */ device_find_first_child(bus, &rh); that causes the exception. This board is also a little odd in that, borrowing from dm_dump_all(): nop 5 [ ] cdns-ti | |-- cdns-usb@4114000 usb 0 [ + ] cdns-usb3-host | | `-- usb@6400000 usb_hub 0 [ + ] usb_hub | | `-- usb_hub usb_hub 1 [ + ] usb_hub | | `-- usb_hub usb_mass_s 0 [ + ] usb_mass_storage | | `-- usb_mass_storage blk 2 [ + ] usb_storage_blk | | `-- usb_mass_storage.lun0 and physically, only that mass storage device is attached to the board itself, the rest is on-device.
Second, while talking with Ilias he's said he'll see if there can be some common function / abstractions done here between the do_bootm_states() code and the efi_exit_boot_services() code as this change shows other common code that's in arch/*/lib/bootm.c. The call to bootm_disable_interrupts() however I have tried to make clear is not handled in a common way as bootm/booti spell out the call as they don't use the BOOTM_STATE_LOADOS flag.
Third, I'm not 100% sure that 3d71c81a9bb0 is still the correct / ideal thing to be doing, and that's what usb_stop() is all about, at that point in the boot cycle. It might well be the kind of quirk that we have handled now, via DM.
Fourth, I really would like to figure out a fix here appropriate to v2022.01 and I think this is the least invasive approach. --- lib/efi_loader/efi_boottime.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/lib/efi_loader/efi_boottime.c b/lib/efi_loader/efi_boottime.c index 1823990d9bd5..0485870d34d9 100644 --- a/lib/efi_loader/efi_boottime.c +++ b/lib/efi_loader/efi_boottime.c @@ -2154,6 +2154,7 @@ static efi_status_t EFIAPI efi_exit_boot_services(efi_handle_t image_handle, }
if (!efi_st_keep_devices) { + bootm_disable_interrupts(); if (IS_ENABLED(CONFIG_USB_DEVICE)) udc_disconnect(); board_quiesce_devices(); @@ -2166,9 +2167,6 @@ static efi_status_t EFIAPI efi_exit_boot_services(efi_handle_t image_handle, /* Fix up caches for EFI payloads if necessary */ efi_exit_caches();
- /* This stops all lingering devices */ - bootm_disable_interrupts(); - /* Disable boot time services */ systab.con_in_handle = NULL; systab.con_in = NULL;

Am 19. November 2021 22:33:04 MEZ schrieb Tom Rini trini@konsulko.com:
If we look at the path that bootm/booti take when preparing to boot the OS, we see that as part of (or prior to calling do_bootm_states, explicitly) the process, bootm_disable_interrupts() is called prior to announce_and_cleanup() which is where udc_disconnect() / board_quiesce_devices() / dm_remove_devices_flags() are called from. In the EFI path, these are called afterwards. In efi_exit_boot_services() however we have been calling bootm_disable_interrupts() after the above functions, as part of ensuring that we disable interrupts as required by the spec. However, bootm_disable_interrupts() is also where we go and call usb_stop(). While this has been fine before, on the TI J721E platform this leads us to an exception. This exception seems likely to be the case that we're trying to stop devices that we have already disabled clocks for. The most direct way to handle this particular
This patch may hide an error on your board but obviously does not address the real problem.
If dependencies in the shut down sequence should exist, we need to consider them in the driver model.
What is your plan to analyze the problem?
Best regards
Heinrich
problem is to make EFI behave like the do_bootm_states() process and ensure we call bootm_disable_interrupts() prior to ending up in usb_stop().
Cc: Ilias Apalodimas ilias.apalodimas@linaro.org Cc: Heinrich Schuchardt xypron.glpk@gmx.de Cc: Simon Glass sjg@chromium.org Suggested-by: Ilias Apalodimas ilias.apalodimas@linaro.org Signed-off-by: Tom Rini trini@konsulko.com
First up, as my wording above implies, I'm assuming rather than being 100% confident in why calling usb_stop() is leading to the exception I get. It's this call: /* Locate root hub device */ device_find_first_child(bus, &rh); that causes the exception. This board is also a little odd in that, borrowing from dm_dump_all(): nop 5 [ ] cdns-ti | |-- cdns-usb@4114000 usb 0 [ + ] cdns-usb3-host | | `-- usb@6400000 usb_hub 0 [ + ] usb_hub | | `-- usb_hub usb_hub 1 [ + ] usb_hub | | `-- usb_hub usb_mass_s 0 [ + ] usb_mass_storage | | `-- usb_mass_storage blk 2 [ + ] usb_storage_blk | | `-- usb_mass_storage.lun0 and physically, only that mass storage device is attached to the board itself, the rest is on-device.
Second, while talking with Ilias he's said he'll see if there can be some common function / abstractions done here between the do_bootm_states() code and the efi_exit_boot_services() code as this change shows other common code that's in arch/*/lib/bootm.c. The call to bootm_disable_interrupts() however I have tried to make clear is not handled in a common way as bootm/booti spell out the call as they don't use the BOOTM_STATE_LOADOS flag.
Third, I'm not 100% sure that 3d71c81a9bb0 is still the correct / ideal thing to be doing, and that's what usb_stop() is all about, at that point in the boot cycle. It might well be the kind of quirk that we have handled now, via DM.
Fourth, I really would like to figure out a fix here appropriate to v2022.01 and I think this is the least invasive approach.
lib/efi_loader/efi_boottime.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/lib/efi_loader/efi_boottime.c b/lib/efi_loader/efi_boottime.c index 1823990d9bd5..0485870d34d9 100644 --- a/lib/efi_loader/efi_boottime.c +++ b/lib/efi_loader/efi_boottime.c @@ -2154,6 +2154,7 @@ static efi_status_t EFIAPI efi_exit_boot_services(efi_handle_t image_handle, }
if (!efi_st_keep_devices) {
if (IS_ENABLED(CONFIG_USB_DEVICE)) udc_disconnect(); board_quiesce_devices();bootm_disable_interrupts();
@@ -2166,9 +2167,6 @@ static efi_status_t EFIAPI efi_exit_boot_services(efi_handle_t image_handle, /* Fix up caches for EFI payloads if necessary */ efi_exit_caches();
- /* This stops all lingering devices */
- bootm_disable_interrupts();
- /* Disable boot time services */ systab.con_in_handle = NULL; systab.con_in = NULL;

On Fri, Nov 19, 2021 at 10:52:27PM +0100, Heinrich Schuchardt wrote:
Am 19. November 2021 22:33:04 MEZ schrieb Tom Rini trini@konsulko.com:
If we look at the path that bootm/booti take when preparing to boot the OS, we see that as part of (or prior to calling do_bootm_states, explicitly) the process, bootm_disable_interrupts() is called prior to announce_and_cleanup() which is where udc_disconnect() / board_quiesce_devices() / dm_remove_devices_flags() are called from. In the EFI path, these are called afterwards. In efi_exit_boot_services() however we have been calling bootm_disable_interrupts() after the above functions, as part of ensuring that we disable interrupts as required by the spec. However, bootm_disable_interrupts() is also where we go and call usb_stop(). While this has been fine before, on the TI J721E platform this leads us to an exception. This exception seems likely to be the case that we're trying to stop devices that we have already disabled clocks for. The most direct way to handle this particular
This patch may hide an error on your board but obviously does not address the real problem.
If dependencies in the shut down sequence should exist, we need to consider them in the driver model.
What is your plan to analyze the problem?
I'm not sure there is a different problem to solve here. It's unsafe to call the "shut everything down" function, which is what usb_stop() is, after having shut everything down. We may be able to stop calling usb_stop() as any sort of shutdown should already have happened via driver model, which is what we see now.

On Sat, 20 Nov 2021 at 00:09, Tom Rini trini@konsulko.com wrote:
On Fri, Nov 19, 2021 at 10:52:27PM +0100, Heinrich Schuchardt wrote:
Am 19. November 2021 22:33:04 MEZ schrieb Tom Rini trini@konsulko.com:
If we look at the path that bootm/booti take when preparing to boot the OS, we see that as part of (or prior to calling do_bootm_states, explicitly) the process, bootm_disable_interrupts() is called prior to announce_and_cleanup() which is where udc_disconnect() / board_quiesce_devices() / dm_remove_devices_flags() are called from. In the EFI path, these are called afterwards. In efi_exit_boot_services() however we have been calling bootm_disable_interrupts() after the above functions, as part of ensuring that we disable interrupts as required by the spec. However, bootm_disable_interrupts() is also where we go and call usb_stop(). While this has been fine before, on the TI J721E platform this leads us to an exception. This exception seems likely to be the case that we're trying to stop devices that we have already disabled clocks for. The most direct way to handle this particular
This patch may hide an error on your board but obviously does not address the real problem.
I don't think it 'hides' it. I think it's the other way around, this board exposes the problem. In any case I think disabling irq's etc make sense to run before we disable devices completely.
If dependencies in the shut down sequence should exist, we need to consider them in the driver model.
Yes agree 100% here.
What is your plan to analyze the problem?
I'm not sure there is a different problem to solve here. It's unsafe to call the "shut everything down" function, which is what usb_stop() is, after having shut everything down. We may be able to stop calling usb_stop() as any sort of shutdown should already have happened via driver model, which is what we see now.
Linux has CONFIG_EFI_DISABLE_PCI_DMA to deal with similar problems on PCI devices. Basically it disables busmastering until linux properly configures the SMMU. I think disabling the devices before handing over to the OS (when possible) is a good policy. In any case I think this patch is reasonable since it at lest makes EFI and bootm/i behave similarly. I'll go have a look on bootm/i and cleanup the whole thing at some point.
Reviewed-by: Ilias Apalodimas ilias.apalodimas@linaro.org
-- Tom

From: Ilias Apalodimas ilias.apalodimas@linaro.org Date: Sat, 20 Nov 2021 10:20:23 +0200
On Sat, 20 Nov 2021 at 00:09, Tom Rini trini@konsulko.com wrote:
On Fri, Nov 19, 2021 at 10:52:27PM +0100, Heinrich Schuchardt wrote:
Am 19. November 2021 22:33:04 MEZ schrieb Tom Rini trini@konsulko.com:
If we look at the path that bootm/booti take when preparing to boot the OS, we see that as part of (or prior to calling do_bootm_states, explicitly) the process, bootm_disable_interrupts() is called prior to announce_and_cleanup() which is where udc_disconnect() / board_quiesce_devices() / dm_remove_devices_flags() are called from. In the EFI path, these are called afterwards. In efi_exit_boot_services() however we have been calling bootm_disable_interrupts() after the above functions, as part of ensuring that we disable interrupts as required by the spec. However, bootm_disable_interrupts() is also where we go and call usb_stop(). While this has been fine before, on the TI J721E platform this leads us to an exception. This exception seems likely to be the case that we're trying to stop devices that we have already disabled clocks for. The most direct way to handle this particular
This patch may hide an error on your board but obviously does not address the real problem.
I don't think it 'hides' it. I think it's the other way around, this board exposes the problem. In any case I think disabling irq's etc make sense to run before we disable devices completely.
If dependencies in the shut down sequence should exist, we need to consider them in the driver model.
Yes agree 100% here.
What is your plan to analyze the problem?
I'm not sure there is a different problem to solve here. It's unsafe to call the "shut everything down" function, which is what usb_stop() is, after having shut everything down. We may be able to stop calling usb_stop() as any sort of shutdown should already have happened via driver model, which is what we see now.
Linux has CONFIG_EFI_DISABLE_PCI_DMA to deal with similar problems on PCI devices. Basically it disables busmastering until linux properly configures the SMMU. I think disabling the devices before handing over to the OS (when possible) is a good policy. In any case I think this patch is reasonable since it at lest makes EFI and bootm/i behave similarly. I'll go have a look on bootm/i and cleanup the whole thing at some point.
Have to be careful here. Some OSes may assume (implicitly or explicitly) that the "firmware" has initialized the hardware to a point where it can be used. This is especially important for serial ports in order to have a debugging channel available in the OS as early as possible. But it is also good for things like USB PHYs and PCIe root complexes. If you tear those down completely (by doing a full reset and/or disabling associated clocks) you risk breaking OSes. This happened to OpenBSD on the Raspberry Pi 4 because earlier this year a change was committed that resets the PCIe root complex. And I think this has been responsible for some of the USB issues with Rockchip platforms as well.
But yes, quiescing DMA masters is necessary, and disabling interrupts is probably a good idea as well.
participants (4)
-
Heinrich Schuchardt
-
Ilias Apalodimas
-
Mark Kettenis
-
Tom Rini