Re: [PATCH] lmb: prohibit allocations above ram_top even from same bank

On Mon, 2 Dec 2024 at 19:16, E Shattow lucent@gmail.com wrote:
Hi Sughosh, Andreas, et. al
On Mon, Dec 2, 2024 at 3:58 AM Andreas Schwab schwab@suse.de wrote:
On Dez 02 2024, Sughosh Ganu wrote:
There are platforms which set the value of ram_top based on certain restrictions that the platform might have in accessing memory above ram_top, even when the memory region is in the same DRAM bank. So, even though the LMB allocator works as expected, when trying to allocate memory above ram_top, prohibit this by marking all memory above ram_top as reserved, even if the said memory region is from the same bank.
This fixes the boot failure on the VisionFive2 board as reported in mvm8qtnv310.fsf@suse.de.
Tested-by: Andreas Schwab schwab@suse.de
-- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different."
Echoing what Andreas finds in testing, this avoids the platform issue with dw_mmc driver and the visionfive2 board support on 4GB Star64, 4GB Mars CM Lite, and 8GB Mars CM Lite WiFi.
diff --git a/lib/lmb.c b/lib/lmb.c index 14b9b8466ff..be168dfb8dc 100644 --- a/lib/lmb.c +++ b/lib/lmb.c @@ -615,6 +615,7 @@ static __maybe_unused void lmb_reserve_common_spl(void) void lmb_add_memory(void) { int i;
- phys_addr_t bank_end; phys_size_t size; u64 ram_top = gd->ram_top; struct bd_info *bd = gd->bd;
@@ -628,6 +629,8 @@ void lmb_add_memory(void)
for (i = 0; i < CONFIG_NR_DRAM_BANKS; i++) { size = bd->bi_dram[i].size;
bank_end = bd->bi_dram[i].start + size;
Delete newline? Unless existing style is wrong here.
It's not about style, but the above two lines are more of an initialisation within the block, which is the reason for a newline here.
if (size) { lmb_add(bd->bi_dram[i].start, size);
@@ -639,6 +642,9 @@ void lmb_add_memory(void) if (bd->bi_dram[i].start >= ram_top) lmb_reserve_flags(bd->bi_dram[i].start, size, LMB_NOOVERWRITE);
else if (bank_end > ram_top)
lmb_reserve_flags(ram_top, bank_end - ram_top,
LMB_NOOVERWRITE);
Please guard this somehow that it will be enabled per each platform that needs it, and/or give warning when conditions this specific code path is a workaround for are reached that are different than before the patch. Any similar platform-specific memory addressing bug/quirk having previously avoided a trigger will silently continue to work here instead of getting documented; we should at least uncover these assumptions.
Instead of putting some check here, and complicating the code, I would say that the board maintainer would know why the ram_top has not been put at the top of DRAM memory -- if possible, the ram_top should be the highest possible DRAM address. So this is the responsibility of the board maintainer instead. Ideally, for a 64 bit machine, like the one where you hit this issue, there is no reason for the board to not use the highest possible DRAM address as ram_top, unless there is some issue, like accessing memory address above a certain value. So, this would not be an issue if the board is using the highest possible value of memory as ram_top, especially when it comes to a single DRAM bank.
In the theoretical situation where we are to just fix the visionfive2 mmc platform specific issue in the dw_mmc driver and not need this workaround code anymore, there's not visibility for us to easily diagnose if a similar assumption or previously un-triggered platform quirk exists elsewhere. There's no warning, and no easy on/off config option to test with and without. The memory rework commit 22f2c9ed9f53 is here a useful tool to break and bring attention to assumptions in code and documentation.
Fixing the mmc driver issue may or may not be possible. And if this is indeed a fixable issue, then, after fixing the mmc driver, the subsequent change should also remove this restriction on the ram_top value, which will automatically take care of this restriction.
-sughosh
}
} }
CONFIG_EFI_LOADER_BOUNCE_BUFFER=y added to visionfive2 config in 6c791b6646c1 does not make sense to me as when I tested it did not avoid the platform issue. Restricting the memory allocations is an effective workaround to the visionfive2 platform issue. I leave that to the experts if a revert is appropriate given that we have an effective workaround now.
Thank you Sughosh! This is (meanwhile until more comments and review) a functional workaround.
Tested-by: E Shattow lucent@gmail.com
participants (1)
-
Sughosh Ganu