[PATCH 0/2] Fix device removal order for Apple dart iommu

Starting with v2024.10 dev_iommu_dma_unmap calls during device removal trigger a NULL pointer dereference since the the iommu device is removed before its user. The sparsely used DM_FLAG_VITAL flag is intended for this dependency. This series adds it to the Apple dart iommu driver and implements the two phased device removal to the EFI loader.
Signed-off-by: Janne Grunau j@jannau.net --- Janne Grunau (2): iommu: apple: Mark device with DM_FLAG_VITAL efi_loader: remove non vital devices first
drivers/iommu/apple_dart.c | 2 +- lib/efi_loader/efi_boottime.c | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) --- base-commit: 1d147b74f437fb0e85821e8271fe52bc5fd30194 change-id: 20241031-iommu_apple_dart_ordering-558e62671512
Best regards,

Avoids NULL pointer dereferences in apple_dart_unmap when the iommu device is removed before its user. U-boot's device model does not track dependencies between devices. Observed on a M1 Ultra Mac Studio with v2024.10.
Signed-off-by: Janne Grunau j@jannau.net --- drivers/iommu/apple_dart.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/apple_dart.c b/drivers/iommu/apple_dart.c index 611ac7cd6deb4fccd45ccfe8cbb9b3e36e2af753..9df174f32d721c39079e65736c91e3e27a72ace2 100644 --- a/drivers/iommu/apple_dart.c +++ b/drivers/iommu/apple_dart.c @@ -312,5 +312,5 @@ U_BOOT_DRIVER(apple_dart) = { .ops = &apple_dart_ops, .probe = apple_dart_probe, .remove = apple_dart_remove, - .flags = DM_FLAG_OS_PREPARE + .flags = DM_FLAG_OS_PREPARE | DM_FLAG_VITAL };

On 10/31/24 23:48, Janne Grunau wrote:
Avoids NULL pointer dereferences in apple_dart_unmap when the iommu device is removed before its user. U-boot's device model does not track dependencies between devices. Observed on a M1 Ultra Mac Studio with v2024.10.
Signed-off-by: Janne Grunau j@jannau.net
Hello Simon,
could you, please, review this patch.
Best regards
Heinrich
drivers/iommu/apple_dart.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/apple_dart.c b/drivers/iommu/apple_dart.c index 611ac7cd6deb4fccd45ccfe8cbb9b3e36e2af753..9df174f32d721c39079e65736c91e3e27a72ace2 100644 --- a/drivers/iommu/apple_dart.c +++ b/drivers/iommu/apple_dart.c @@ -312,5 +312,5 @@ U_BOOT_DRIVER(apple_dart) = { .ops = &apple_dart_ops, .probe = apple_dart_probe, .remove = apple_dart_remove,
- .flags = DM_FLAG_OS_PREPARE
- .flags = DM_FLAG_OS_PREPARE | DM_FLAG_VITAL };

On 10/31/24 23:48, Janne Grunau wrote:
Avoids NULL pointer dereferences in apple_dart_unmap when the iommu device is removed before its user. U-boot's device model does not track dependencies between devices. Observed on a M1 Ultra Mac Studio with v2024.10.
Signed-off-by: Janne Grunau j@jannau.net
Hello Simon,
could you, please, review this patch.
Best regards
Heinrich
drivers/iommu/apple_dart.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/apple_dart.c b/drivers/iommu/apple_dart.c index 611ac7cd6deb4fccd45ccfe8cbb9b3e36e2af753..9df174f32d721c39079e65736c91e3e27a72ace2 100644 --- a/drivers/iommu/apple_dart.c +++ b/drivers/iommu/apple_dart.c @@ -312,5 +312,5 @@ U_BOOT_DRIVER(apple_dart) = { .ops = &apple_dart_ops, .probe = apple_dart_probe, .remove = apple_dart_remove,
- .flags = DM_FLAG_OS_PREPARE
- .flags = DM_FLAG_OS_PREPARE | DM_FLAG_VITAL };

From: Janne Grunau j@jannau.net Date: Thu, 31 Oct 2024 23:48:01 +0100
Avoids NULL pointer dereferences in apple_dart_unmap when the iommu device is removed before its user. U-boot's device model does not track dependencies between devices. Observed on a M1 Ultra Mac Studio with v2024.10.
Signed-off-by: Janne Grunau j@jannau.net
Acked-by: Mark Kettenis kettenis@openbsd.org
drivers/iommu/apple_dart.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/apple_dart.c b/drivers/iommu/apple_dart.c index 611ac7cd6deb4fccd45ccfe8cbb9b3e36e2af753..9df174f32d721c39079e65736c91e3e27a72ace2 100644 --- a/drivers/iommu/apple_dart.c +++ b/drivers/iommu/apple_dart.c @@ -312,5 +312,5 @@ U_BOOT_DRIVER(apple_dart) = { .ops = &apple_dart_ops, .probe = apple_dart_probe, .remove = apple_dart_remove,
- .flags = DM_FLAG_OS_PREPARE
- .flags = DM_FLAG_OS_PREPARE | DM_FLAG_VITAL
};
-- 2.47.0

DM_FLAG_VITAL marks devices which are essential for the operation of other devices. Removing these devices before their users can result in hangs or crashes. This potentially fixes EFI boot of Renesas rcar3 devices. Their clock devices (and with this series the dart iommu) are the only devices markes as vital. The arm boot code already handles devioce removal in this way.
Signed-off-by: Janne Grunau j@jannau.net --- lib/efi_loader/efi_boottime.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/lib/efi_loader/efi_boottime.c b/lib/efi_loader/efi_boottime.c index 4f52284b4c653c252b0ed6c0c87da8901448d4b4..7db3c95782970f8c06a970a8ee86b1804cd848b6 100644 --- a/lib/efi_loader/efi_boottime.c +++ b/lib/efi_loader/efi_boottime.c @@ -2234,6 +2234,7 @@ static efi_status_t EFIAPI efi_exit_boot_services(efi_handle_t image_handle, if (IS_ENABLED(CONFIG_USB_DEVICE)) udc_disconnect(); board_quiesce_devices(); + dm_remove_devices_flags(DM_REMOVE_ACTIVE_ALL | DM_REMOVE_NON_VITAL); dm_remove_devices_flags(DM_REMOVE_ACTIVE_ALL); }

From: Janne Grunau j@jannau.net Date: Thu, 31 Oct 2024 23:48:02 +0100
DM_FLAG_VITAL marks devices which are essential for the operation of other devices. Removing these devices before their users can result in hangs or crashes. This potentially fixes EFI boot of Renesas rcar3 devices. Their clock devices (and with this series the dart iommu) are the only devices markes as vital. The arm boot code already handles devioce removal in this way.
There is a typo in that last sentence of the commit message (devioce). Otherwise:
Signed-off-by: Janne Grunau j@jannau.net
Reviewed-by: Mark Kettenis kettenis@openbsd.org
lib/efi_loader/efi_boottime.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/lib/efi_loader/efi_boottime.c b/lib/efi_loader/efi_boottime.c index 4f52284b4c653c252b0ed6c0c87da8901448d4b4..7db3c95782970f8c06a970a8ee86b1804cd848b6 100644 --- a/lib/efi_loader/efi_boottime.c +++ b/lib/efi_loader/efi_boottime.c @@ -2234,6 +2234,7 @@ static efi_status_t EFIAPI efi_exit_boot_services(efi_handle_t image_handle, if (IS_ENABLED(CONFIG_USB_DEVICE)) udc_disconnect(); board_quiesce_devices();
dm_remove_devices_flags(DM_REMOVE_ACTIVE_ALL); }dm_remove_devices_flags(DM_REMOVE_ACTIVE_ALL | DM_REMOVE_NON_VITAL);
-- 2.47.0

On 11/1/24 21:29, Mark Kettenis wrote:
From: Janne Grunau j@jannau.net Date: Thu, 31 Oct 2024 23:48:02 +0100
DM_FLAG_VITAL marks devices which are essential for the operation of other devices. Removing these devices before their users can result in hangs or crashes. This potentially fixes EFI boot of Renesas rcar3 devices. Their clock devices (and with this series the dart iommu) are the only devices markes as vital. The arm boot code already handles devioce removal in this way.
There is a typo in that last sentence of the commit message (devioce). Otherwise:
Signed-off-by: Janne Grunau j@jannau.net
Reviewed-by: Mark Kettenis kettenis@openbsd.org
lib/efi_loader/efi_boottime.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/lib/efi_loader/efi_boottime.c b/lib/efi_loader/efi_boottime.c index 4f52284b4c653c252b0ed6c0c87da8901448d4b4..7db3c95782970f8c06a970a8ee86b1804cd848b6 100644 --- a/lib/efi_loader/efi_boottime.c +++ b/lib/efi_loader/efi_boottime.c @@ -2234,6 +2234,7 @@ static efi_status_t EFIAPI efi_exit_boot_services(efi_handle_t image_handle, if (IS_ENABLED(CONFIG_USB_DEVICE)) udc_disconnect(); board_quiesce_devices();
dm_remove_devices_flags(DM_REMOVE_ACTIVE_ALL);dm_remove_devices_flags(DM_REMOVE_ACTIVE_ALL | DM_REMOVE_NON_VITAL);
Simon's patch 6224dc9ba428 ("arm: Remove vital devices last") addressed the same issue for bootm on arm. But what about about other architectures?
This logic should be moved to drivers/core/root.c instead of replicating code.
Best regards
Heinrich
}
-- 2.47.0

Hi,
On Wed, 13 Nov 2024 at 05:52, Heinrich Schuchardt xypron.glpk@gmx.de wrote:
On 11/1/24 21:29, Mark Kettenis wrote:
From: Janne Grunau j@jannau.net Date: Thu, 31 Oct 2024 23:48:02 +0100
DM_FLAG_VITAL marks devices which are essential for the operation of other devices. Removing these devices before their users can result in hangs or crashes. This potentially fixes EFI boot of Renesas rcar3 devices. Their clock devices (and with this series the dart iommu) are the only devices markes as vital. The arm boot code already handles devioce removal in this way.
There is a typo in that last sentence of the commit message (devioce). Otherwise:
Signed-off-by: Janne Grunau j@jannau.net
Reviewed-by: Mark Kettenis kettenis@openbsd.org
lib/efi_loader/efi_boottime.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/lib/efi_loader/efi_boottime.c b/lib/efi_loader/efi_boottime.c index 4f52284b4c653c252b0ed6c0c87da8901448d4b4..7db3c95782970f8c06a970a8ee86b1804cd848b6 100644 --- a/lib/efi_loader/efi_boottime.c +++ b/lib/efi_loader/efi_boottime.c @@ -2234,6 +2234,7 @@ static efi_status_t EFIAPI efi_exit_boot_services(efi_handle_t image_handle, if (IS_ENABLED(CONFIG_USB_DEVICE)) udc_disconnect(); board_quiesce_devices();
dm_remove_devices_flags(DM_REMOVE_ACTIVE_ALL | DM_REMOVE_NON_VITAL); dm_remove_devices_flags(DM_REMOVE_ACTIVE_ALL);
Simon's patch 6224dc9ba428 ("arm: Remove vital devices last") addressed the same issue for bootm on arm. But what about about other architectures?
This logic should be moved to drivers/core/root.c instead of replicating code.
We could have a common helper, but it should not be in driver/core as this ordering is more of a policy decision. Unless we can add a parameter telling dm exactly what to do...
BTW, Heinrich, this behaviour is exactly what my bootflow_efi() test was supposed to check. But since it doesn't have the exit-boot-services piece at your request...
Regards, Simon

Am 13. November 2024 15:39:22 MEZ schrieb Simon Glass sjg@chromium.org:
Hi,
On Wed, 13 Nov 2024 at 05:52, Heinrich Schuchardt xypron.glpk@gmx.de wrote:
On 11/1/24 21:29, Mark Kettenis wrote:
From: Janne Grunau j@jannau.net Date: Thu, 31 Oct 2024 23:48:02 +0100
DM_FLAG_VITAL marks devices which are essential for the operation of other devices. Removing these devices before their users can result in hangs or crashes. This potentially fixes EFI boot of Renesas rcar3 devices. Their clock devices (and with this series the dart iommu) are the only devices markes as vital. The arm boot code already handles devioce removal in this way.
There is a typo in that last sentence of the commit message (devioce). Otherwise:
Signed-off-by: Janne Grunau j@jannau.net
Reviewed-by: Mark Kettenis kettenis@openbsd.org
lib/efi_loader/efi_boottime.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/lib/efi_loader/efi_boottime.c b/lib/efi_loader/efi_boottime.c index 4f52284b4c653c252b0ed6c0c87da8901448d4b4..7db3c95782970f8c06a970a8ee86b1804cd848b6 100644 --- a/lib/efi_loader/efi_boottime.c +++ b/lib/efi_loader/efi_boottime.c @@ -2234,6 +2234,7 @@ static efi_status_t EFIAPI efi_exit_boot_services(efi_handle_t image_handle, if (IS_ENABLED(CONFIG_USB_DEVICE)) udc_disconnect(); board_quiesce_devices();
dm_remove_devices_flags(DM_REMOVE_ACTIVE_ALL | DM_REMOVE_NON_VITAL); dm_remove_devices_flags(DM_REMOVE_ACTIVE_ALL);
Simon's patch 6224dc9ba428 ("arm: Remove vital devices last") addressed the same issue for bootm on arm. But what about about other architectures?
This logic should be moved to drivers/core/root.c instead of replicating code.
We could have a common helper, but it should not be in driver/core as this ordering is more of a policy decision. Unless we can add a parameter telling dm exactly what to do...
BTW, Heinrich, this behaviour is exactly what my bootflow_efi() test was supposed to check. But since it doesn't have the exit-boot-services piece at your request...
Regards, Simon
Why can't we generally remove non-vital devices first if all are to be removed?
I cannot see anything device specific here.
Best regards
Heinrich

Hi Heinrich,
On Wed, 13 Nov 2024 at 08:17, Heinrich Schuchardt xypron.glpk@gmx.de wrote:
Am 13. November 2024 15:39:22 MEZ schrieb Simon Glass sjg@chromium.org:
Hi,
On Wed, 13 Nov 2024 at 05:52, Heinrich Schuchardt xypron.glpk@gmx.de
wrote:
On 11/1/24 21:29, Mark Kettenis wrote:
From: Janne Grunau j@jannau.net Date: Thu, 31 Oct 2024 23:48:02 +0100
DM_FLAG_VITAL marks devices which are essential for the operation of other devices. Removing these devices before their users can result
in
hangs or crashes. This potentially fixes EFI boot of Renesas rcar3 devices. Their
clock
devices (and with this series the dart iommu) are the only devices markes as vital. The arm boot code already handles devioce removal in this way.
There is a typo in that last sentence of the commit message
(devioce).
Otherwise:
Signed-off-by: Janne Grunau j@jannau.net
Reviewed-by: Mark Kettenis kettenis@openbsd.org
lib/efi_loader/efi_boottime.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/lib/efi_loader/efi_boottime.c
b/lib/efi_loader/efi_boottime.c
index
4f52284b4c653c252b0ed6c0c87da8901448d4b4..7db3c95782970f8c06a970a8ee86b1804cd848b6 100644
--- a/lib/efi_loader/efi_boottime.c +++ b/lib/efi_loader/efi_boottime.c @@ -2234,6 +2234,7 @@ static efi_status_t EFIAPI
efi_exit_boot_services(efi_handle_t image_handle,
if (IS_ENABLED(CONFIG_USB_DEVICE)) udc_disconnect(); board_quiesce_devices();
dm_remove_devices_flags(DM_REMOVE_ACTIVE_ALL |
DM_REMOVE_NON_VITAL);
dm_remove_devices_flags(DM_REMOVE_ACTIVE_ALL);
Simon's patch 6224dc9ba428 ("arm: Remove vital devices last") addressed the same issue for bootm on arm. But what about about other
architectures?
This logic should be moved to drivers/core/root.c instead of
replicating
code.
We could have a common helper, but it should not be in driver/core as this ordering is more of a policy decision. Unless we can add a parameter telling dm exactly what to do...
BTW, Heinrich, this behaviour is exactly what my bootflow_efi() test was supposed to check. But since it doesn't have the exit-boot-services piece at your request...
Regards, Simon
Why can't we generally remove non-vital devices first if all are to be
removed?
I cannot see anything device specific here.
No objection to that...but it needs to be in a new function, not become the default behaviour of dm_remove_devices_flags(), which is supposed to do what it is told and in one pass.
Regards, Simon

Am 13. November 2024 17:03:03 MEZ schrieb Simon Glass sjg@chromium.org:
Hi Heinrich,
On Wed, 13 Nov 2024 at 08:17, Heinrich Schuchardt xypron.glpk@gmx.de wrote:
Am 13. November 2024 15:39:22 MEZ schrieb Simon Glass sjg@chromium.org:
Hi,
On Wed, 13 Nov 2024 at 05:52, Heinrich Schuchardt xypron.glpk@gmx.de
wrote:
On 11/1/24 21:29, Mark Kettenis wrote:
From: Janne Grunau j@jannau.net Date: Thu, 31 Oct 2024 23:48:02 +0100
DM_FLAG_VITAL marks devices which are essential for the operation of other devices. Removing these devices before their users can result
in
hangs or crashes. This potentially fixes EFI boot of Renesas rcar3 devices. Their
clock
devices (and with this series the dart iommu) are the only devices markes as vital. The arm boot code already handles devioce removal in this way.
There is a typo in that last sentence of the commit message
(devioce).
Otherwise:
Signed-off-by: Janne Grunau j@jannau.net
Reviewed-by: Mark Kettenis kettenis@openbsd.org
lib/efi_loader/efi_boottime.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/lib/efi_loader/efi_boottime.c
b/lib/efi_loader/efi_boottime.c
index
4f52284b4c653c252b0ed6c0c87da8901448d4b4..7db3c95782970f8c06a970a8ee86b1804cd848b6 100644
--- a/lib/efi_loader/efi_boottime.c +++ b/lib/efi_loader/efi_boottime.c @@ -2234,6 +2234,7 @@ static efi_status_t EFIAPI
efi_exit_boot_services(efi_handle_t image_handle,
if (IS_ENABLED(CONFIG_USB_DEVICE)) udc_disconnect(); board_quiesce_devices();
dm_remove_devices_flags(DM_REMOVE_ACTIVE_ALL |
DM_REMOVE_NON_VITAL);
dm_remove_devices_flags(DM_REMOVE_ACTIVE_ALL);
Simon's patch 6224dc9ba428 ("arm: Remove vital devices last") addressed the same issue for bootm on arm. But what about about other
architectures?
This logic should be moved to drivers/core/root.c instead of
replicating
code.
We could have a common helper, but it should not be in driver/core as this ordering is more of a policy decision. Unless we can add a parameter telling dm exactly what to do...
BTW, Heinrich, this behaviour is exactly what my bootflow_efi() test was supposed to check. But since it doesn't have the exit-boot-services piece at your request...
Regards, Simon
Why can't we generally remove non-vital devices first if all are to be
removed?
I cannot see anything device specific here.
No objection to that...but it needs to be in a new function, not become the default behaviour of dm_remove_devices_flags(), which is supposed to do what it is told and in one pass.
dm_remove_device_flags() makes no specific promises concerning the deletion sequence and its internal working. But obviously it will fail if non-vital redources are not deleted first.
Leaving this broken function exported and writing a new one has no user benefit.
We could copy the old function to a new static function that we call as needed from the old function name.
Best regards
Heinruch
Regards, Simon

Hi Heinrich,
On Wed, 13 Nov 2024 at 11:45, Heinrich Schuchardt xypron.glpk@gmx.de wrote:
Am 13. November 2024 17:03:03 MEZ schrieb Simon Glass sjg@chromium.org:
Hi Heinrich,
On Wed, 13 Nov 2024 at 08:17, Heinrich Schuchardt xypron.glpk@gmx.de wrote:
Am 13. November 2024 15:39:22 MEZ schrieb Simon Glass sjg@chromium.org:
Hi,
On Wed, 13 Nov 2024 at 05:52, Heinrich Schuchardt xypron.glpk@gmx.de
wrote:
On 11/1/24 21:29, Mark Kettenis wrote:
> From: Janne Grunau j@jannau.net > Date: Thu, 31 Oct 2024 23:48:02 +0100 > > DM_FLAG_VITAL marks devices which are essential for the operation of > other devices. Removing these devices before their users can result
in
> hangs or crashes. > This potentially fixes EFI boot of Renesas rcar3 devices. Their
clock
> devices (and with this series the dart iommu) are the only devices > markes as vital. > The arm boot code already handles devioce removal in this way.
There is a typo in that last sentence of the commit message
(devioce).
Otherwise:
> Signed-off-by: Janne Grunau j@jannau.net
Reviewed-by: Mark Kettenis kettenis@openbsd.org
> --- > lib/efi_loader/efi_boottime.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/lib/efi_loader/efi_boottime.c
b/lib/efi_loader/efi_boottime.c
> index
4f52284b4c653c252b0ed6c0c87da8901448d4b4..7db3c95782970f8c06a970a8ee86b1804cd848b6 100644
> --- a/lib/efi_loader/efi_boottime.c > +++ b/lib/efi_loader/efi_boottime.c > @@ -2234,6 +2234,7 @@ static efi_status_t EFIAPI
efi_exit_boot_services(efi_handle_t image_handle,
> if (IS_ENABLED(CONFIG_USB_DEVICE)) > udc_disconnect(); > board_quiesce_devices(); > + dm_remove_devices_flags(DM_REMOVE_ACTIVE_ALL |
DM_REMOVE_NON_VITAL);
> dm_remove_devices_flags(DM_REMOVE_ACTIVE_ALL);
Simon's patch 6224dc9ba428 ("arm: Remove vital devices last") addressed the same issue for bootm on arm. But what about about other
architectures?
This logic should be moved to drivers/core/root.c instead of
replicating
code.
We could have a common helper, but it should not be in driver/core as this ordering is more of a policy decision. Unless we can add a parameter telling dm exactly what to do...
BTW, Heinrich, this behaviour is exactly what my bootflow_efi() test was supposed to check. But since it doesn't have the exit-boot-services piece at your request...
Regards, Simon
Why can't we generally remove non-vital devices first if all are to be
removed?
I cannot see anything device specific here.
No objection to that...but it needs to be in a new function, not become the default behaviour of dm_remove_devices_flags(), which is supposed to do what it is told and in one pass.
dm_remove_device_flags() makes no specific promises concerning the deletion sequence and its internal working.
Do you mean in the function comment in the header file? We can certainly update that.
But obviously it will fail if non-vital redources are not deleted first.
Either I don't understand that, or I don't agree. Can you give a specific example?
Leaving this broken function exported and writing a new one has no user benefit.
This function works fine. It is not broken.
We could copy the old function to a new static function that we call as needed from the old function name.
Yes, that's fine with me. Also, did you see my note about the bootflow_efi() test?
Regards, Simon

On Wed, Nov 13, 2024 at 08:53:17PM -0700, Simon Glass wrote:
Hi Heinrich,
On Wed, 13 Nov 2024 at 11:45, Heinrich Schuchardt xypron.glpk@gmx.de wrote:
Am 13. November 2024 17:03:03 MEZ schrieb Simon Glass sjg@chromium.org:
Hi Heinrich,
On Wed, 13 Nov 2024 at 08:17, Heinrich Schuchardt xypron.glpk@gmx.de wrote:
Am 13. November 2024 15:39:22 MEZ schrieb Simon Glass sjg@chromium.org:
Hi,
On Wed, 13 Nov 2024 at 05:52, Heinrich Schuchardt xypron.glpk@gmx.de
wrote:
On 11/1/24 21:29, Mark Kettenis wrote: >> From: Janne Grunau j@jannau.net >> Date: Thu, 31 Oct 2024 23:48:02 +0100 >> >> DM_FLAG_VITAL marks devices which are essential for the operation of >> other devices. Removing these devices before their users can result
in
>> hangs or crashes. >> This potentially fixes EFI boot of Renesas rcar3 devices. Their
clock
>> devices (and with this series the dart iommu) are the only devices >> markes as vital. >> The arm boot code already handles devioce removal in this way. > > There is a typo in that last sentence of the commit message
(devioce).
> Otherwise: > >> Signed-off-by: Janne Grunau j@jannau.net > > Reviewed-by: Mark Kettenis kettenis@openbsd.org > >> --- >> lib/efi_loader/efi_boottime.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/lib/efi_loader/efi_boottime.c
b/lib/efi_loader/efi_boottime.c
>> index
4f52284b4c653c252b0ed6c0c87da8901448d4b4..7db3c95782970f8c06a970a8ee86b1804cd848b6 100644
>> --- a/lib/efi_loader/efi_boottime.c >> +++ b/lib/efi_loader/efi_boottime.c >> @@ -2234,6 +2234,7 @@ static efi_status_t EFIAPI
efi_exit_boot_services(efi_handle_t image_handle,
>> if (IS_ENABLED(CONFIG_USB_DEVICE)) >> udc_disconnect(); >> board_quiesce_devices(); >> + dm_remove_devices_flags(DM_REMOVE_ACTIVE_ALL |
DM_REMOVE_NON_VITAL);
>> dm_remove_devices_flags(DM_REMOVE_ACTIVE_ALL);
Simon's patch 6224dc9ba428 ("arm: Remove vital devices last") addressed the same issue for bootm on arm. But what about about other
architectures?
This logic should be moved to drivers/core/root.c instead of
replicating
code.
We could have a common helper, but it should not be in driver/core as this ordering is more of a policy decision. Unless we can add a parameter telling dm exactly what to do...
BTW, Heinrich, this behaviour is exactly what my bootflow_efi() test was supposed to check. But since it doesn't have the exit-boot-services piece at your request...
Regards, Simon
Why can't we generally remove non-vital devices first if all are to be
removed?
I cannot see anything device specific here.
No objection to that...but it needs to be in a new function, not become the default behaviour of dm_remove_devices_flags(), which is supposed to do what it is told and in one pass.
dm_remove_device_flags() makes no specific promises concerning the deletion sequence and its internal working.
Do you mean in the function comment in the header file? We can certainly update that.
That seems like "fixing" the problem by documenting that things should work in the seemingly broken way.
But obviously it will fail if non-vital redources are not deleted first.
Either I don't understand that, or I don't agree. Can you give a specific example?
The Apple IOMMU? That's what got this thread started.
Leaving this broken function exported and writing a new one has no user benefit.
This function works fine. It is not broken.
We could copy the old function to a new static function that we call as needed from the old function name.
Yes, that's fine with me. Also, did you see my note about the bootflow_efi() test?
But since your test passed (I assume) and real platforms are failing and requiring a fix, I'm not sure this is a strong argument? Or did you have a patch in a later series that fixed the problem here as well?

Hi Tom,
On Thu, 14 Nov 2024 at 07:26, Tom Rini trini@konsulko.com wrote:
On Wed, Nov 13, 2024 at 08:53:17PM -0700, Simon Glass wrote:
Hi Heinrich,
On Wed, 13 Nov 2024 at 11:45, Heinrich Schuchardt xypron.glpk@gmx.de wrote:
Am 13. November 2024 17:03:03 MEZ schrieb Simon Glass sjg@chromium.org:
Hi Heinrich,
On Wed, 13 Nov 2024 at 08:17, Heinrich Schuchardt xypron.glpk@gmx.de wrote:
Am 13. November 2024 15:39:22 MEZ schrieb Simon Glass sjg@chromium.org:
Hi,
On Wed, 13 Nov 2024 at 05:52, Heinrich Schuchardt xypron.glpk@gmx.de
wrote:
> > On 11/1/24 21:29, Mark Kettenis wrote: > >> From: Janne Grunau j@jannau.net > >> Date: Thu, 31 Oct 2024 23:48:02 +0100 > >> > >> DM_FLAG_VITAL marks devices which are essential for the operation of > >> other devices. Removing these devices before their users can result
in
> >> hangs or crashes. > >> This potentially fixes EFI boot of Renesas rcar3 devices. Their
clock
> >> devices (and with this series the dart iommu) are the only devices > >> markes as vital. > >> The arm boot code already handles devioce removal in this way. > > > > There is a typo in that last sentence of the commit message
(devioce).
> > Otherwise: > > > >> Signed-off-by: Janne Grunau j@jannau.net > > > > Reviewed-by: Mark Kettenis kettenis@openbsd.org > > > >> --- > >> lib/efi_loader/efi_boottime.c | 1 + > >> 1 file changed, 1 insertion(+) > >> > >> diff --git a/lib/efi_loader/efi_boottime.c
b/lib/efi_loader/efi_boottime.c
> >> index
4f52284b4c653c252b0ed6c0c87da8901448d4b4..7db3c95782970f8c06a970a8ee86b1804cd848b6 100644
> >> --- a/lib/efi_loader/efi_boottime.c > >> +++ b/lib/efi_loader/efi_boottime.c > >> @@ -2234,6 +2234,7 @@ static efi_status_t EFIAPI
efi_exit_boot_services(efi_handle_t image_handle,
> >> if (IS_ENABLED(CONFIG_USB_DEVICE)) > >> udc_disconnect(); > >> board_quiesce_devices(); > >> + dm_remove_devices_flags(DM_REMOVE_ACTIVE_ALL |
DM_REMOVE_NON_VITAL);
> >> dm_remove_devices_flags(DM_REMOVE_ACTIVE_ALL); > > Simon's patch 6224dc9ba428 ("arm: Remove vital devices last") addressed > the same issue for bootm on arm. But what about about other
architectures?
> > This logic should be moved to drivers/core/root.c instead of
replicating
> code.
We could have a common helper, but it should not be in driver/core as this ordering is more of a policy decision. Unless we can add a parameter telling dm exactly what to do...
BTW, Heinrich, this behaviour is exactly what my bootflow_efi() test was supposed to check. But since it doesn't have the exit-boot-services piece at your request...
Regards, Simon
Why can't we generally remove non-vital devices first if all are to be
removed?
I cannot see anything device specific here.
No objection to that...but it needs to be in a new function, not become the default behaviour of dm_remove_devices_flags(), which is supposed to do what it is told and in one pass.
dm_remove_device_flags() makes no specific promises concerning the deletion sequence and its internal working.
Do you mean in the function comment in the header file? We can certainly update that.
That seems like "fixing" the problem by documenting that things should work in the seemingly broken way.
But obviously it will fail if non-vital redources are not deleted first.
Either I don't understand that, or I don't agree. Can you give a specific example?
The Apple IOMMU? That's what got this thread started.
Leaving this broken function exported and writing a new one has no user benefit.
This function works fine. It is not broken.
We could copy the old function to a new static function that we call as needed from the old function name.
Yes, that's fine with me. Also, did you see my note about the bootflow_efi() test?
But since your test passed (I assume) and real platforms are failing and requiring a fix, I'm not sure this is a strong argument? Or did you have a patch in a later series that fixed the problem here as well?
My comments were to Heinrich, not this patch. This patch is fine.
Heinrich, if you want to clean this up, please create a new function which does these two steps separately.
Reviewed-by: Simon Glass sjg@chromium.org
Regards, Simon

From: Simon Glass sjg@chromium.org Date: Wed, 13 Nov 2024 07:39:22 -0700
Hi,
Hi Simon,
On Wed, 13 Nov 2024 at 05:52, Heinrich Schuchardt xypron.glpk@gmx.de wrote:
On 11/1/24 21:29, Mark Kettenis wrote:
From: Janne Grunau j@jannau.net Date: Thu, 31 Oct 2024 23:48:02 +0100
DM_FLAG_VITAL marks devices which are essential for the operation of other devices. Removing these devices before their users can result in hangs or crashes. This potentially fixes EFI boot of Renesas rcar3 devices. Their clock devices (and with this series the dart iommu) are the only devices markes as vital. The arm boot code already handles devioce removal in this way.
There is a typo in that last sentence of the commit message (devioce). Otherwise:
Signed-off-by: Janne Grunau j@jannau.net
Reviewed-by: Mark Kettenis kettenis@openbsd.org
lib/efi_loader/efi_boottime.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/lib/efi_loader/efi_boottime.c b/lib/efi_loader/efi_boottime.c index 4f52284b4c653c252b0ed6c0c87da8901448d4b4..7db3c95782970f8c06a970a8ee86b1804cd848b6 100644 --- a/lib/efi_loader/efi_boottime.c +++ b/lib/efi_loader/efi_boottime.c @@ -2234,6 +2234,7 @@ static efi_status_t EFIAPI efi_exit_boot_services(efi_handle_t image_handle, if (IS_ENABLED(CONFIG_USB_DEVICE)) udc_disconnect(); board_quiesce_devices();
dm_remove_devices_flags(DM_REMOVE_ACTIVE_ALL | DM_REMOVE_NON_VITAL); dm_remove_devices_flags(DM_REMOVE_ACTIVE_ALL);
Simon's patch 6224dc9ba428 ("arm: Remove vital devices last") addressed the same issue for bootm on arm. But what about about other architectures?
This logic should be moved to drivers/core/root.c instead of replicating code.
We could have a common helper, but it should not be in driver/core as this ordering is more of a policy decision. Unless we can add a parameter telling dm exactly what to do...
But I don't think it makes sense for this to be a per-architecture policy (like it is now).
Also, not that outside of the testsuite, we currently either do:
dm_remove_devices_flags(DM_REMOVE_ACTIVE_ALL | DM_REMOVE_NON_VITAL); dm_remove_devices_flags(DM_REMOVE_ACTIVE_ALL);
or just:
dm_remove_devices_flags(DM_REMOVE_ACTIVE_ALL);
and I'd argue that all instances of the latter should be converted into the latter. So we really would only have a single policy here...

On Fri, 1 Nov 2024 at 04:37, Janne Grunau j@jannau.net wrote:
Starting with v2024.10 dev_iommu_dma_unmap calls during device removal trigger a NULL pointer dereference since the the iommu device is removed before its user. The sparsely used DM_FLAG_VITAL flag is intended for this dependency. This series adds it to the Apple dart iommu driver and implements the two phased device removal to the EFI loader.
Is this also the cause of the crash that you were observing with the RFC patches that I had posted earlier?
-sughosh
Signed-off-by: Janne Grunau j@jannau.net
Janne Grunau (2): iommu: apple: Mark device with DM_FLAG_VITAL efi_loader: remove non vital devices first
drivers/iommu/apple_dart.c | 2 +- lib/efi_loader/efi_boottime.c | 1 + 2 files changed, 2 insertions(+), 1 deletion(-)
base-commit: 1d147b74f437fb0e85821e8271fe52bc5fd30194 change-id: 20241031-iommu_apple_dart_ordering-558e62671512
Best regards,
Janne Grunau j@jannau.net

On Fri, Nov 01, 2024 at 11:52:26AM +0530, Sughosh Ganu wrote:
On Fri, 1 Nov 2024 at 04:37, Janne Grunau j@jannau.net wrote:
Starting with v2024.10 dev_iommu_dma_unmap calls during device removal trigger a NULL pointer dereference since the the iommu device is removed before its user. The sparsely used DM_FLAG_VITAL flag is intended for this dependency. This series adds it to the Apple dart iommu driver and implements the two phased device removal to the EFI loader.
Is this also the cause of the crash that you were observing with the RFC patches that I had posted earlier?
It shouldn't as I applied those patches on top of this. This is a separate issue already present in v2024.10. I discovered the LMB changes while investigating this.
Janne

On Thu, Oct 31, 2024 at 11:48:00PM +0100, Janne Grunau wrote:
Starting with v2024.10 dev_iommu_dma_unmap calls during device removal trigger a NULL pointer dereference since the the iommu device is removed before its user. The sparsely used DM_FLAG_VITAL flag is intended for this dependency. This series adds it to the Apple dart iommu driver and implements the two phased device removal to the EFI loader.
Can we get this two small patches merged? They fix a regression although it worked previously just accidentally. The only drivers using DM_FLAG_VITAL are clk-rcar-gen3.c, rzg2l-cpg.c and apple_dart.c all used on arm SoCs. arch/arm/lib/bootm.c already uses a device_remove call with DM_REMOVE_NON_VITAL so adding it to efi_loader looks reasonable as regression fix.
Janne

On Thu, Nov 21, 2024 at 08:39:26AM +0100, Janne Grunau wrote:
On Thu, Oct 31, 2024 at 11:48:00PM +0100, Janne Grunau wrote:
Starting with v2024.10 dev_iommu_dma_unmap calls during device removal trigger a NULL pointer dereference since the the iommu device is removed before its user. The sparsely used DM_FLAG_VITAL flag is intended for this dependency. This series adds it to the Apple dart iommu driver and implements the two phased device removal to the EFI loader.
Can we get this two small patches merged? They fix a regression although it worked previously just accidentally. The only drivers using DM_FLAG_VITAL are clk-rcar-gen3.c, rzg2l-cpg.c and apple_dart.c all used on arm SoCs. arch/arm/lib/bootm.c already uses a device_remove call with DM_REMOVE_NON_VITAL so adding it to efi_loader looks reasonable as regression fix.
Can you please take a look at doing the changes Heinrich suggested I believe in 2/2 ?
participants (6)
-
Heinrich Schuchardt
-
Janne Grunau
-
Mark Kettenis
-
Simon Glass
-
Sughosh Ganu
-
Tom Rini