When running out of memory get a chunk of memory from ZoneTmpHigh to expand ZoneHigh. Drop simliar logic fro pmm code because it's not needed ay more.
This fixes some scalability problems, for example with lots of vcpus, where seabios runs out of memory due to large smbios/acpi tables.
Gerd Hoffmann (2): malloc: add on demand ZoneHigh expansion support Revert "pmm: use tmp zone on oom"
src/malloc.c | 16 ++++++++++++++++ src/pmm.c | 13 ------------- 2 files changed, 16 insertions(+), 13 deletions(-)
When running out of memory in ZoneHigh go alloc memory from ZoneTmpHigh, add it as reserved to the e820 map and expand ZoneHigh
Signed-off-by: Gerd Hoffmann kraxel@redhat.com --- src/malloc.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/src/malloc.c b/src/malloc.c index 3733855caf2c..02067b5370f9 100644 --- a/src/malloc.c +++ b/src/malloc.c @@ -226,6 +226,20 @@ zonelow_expand(u32 size, u32 align, struct allocinfo_s *fill) return alloc_new(&ZoneLow, size, align, fill); }
+static u32 +zonehigh_expand(u32 size, u32 align, struct allocinfo_s *fill) +{ + u32 xsize = ALIGN(size, BUILD_MAX_HIGHTABLE); + u32 xalign = align > PAGE_SIZE ? align : PAGE_SIZE; + u32 xdata = malloc_palloc(&ZoneTmpHigh, xsize, xalign); + if (!xdata) + return 0; + + alloc_add(&ZoneHigh, xdata, xdata + xsize); + e820_add(xdata, size, E820_RESERVED); + return alloc_new(&ZoneHigh, size, align, fill); +} +
/**************************************************************** * tracked memory allocations @@ -245,6 +259,8 @@ malloc_palloc(struct zone_s *zone, u32 size, u32 align) u32 data = alloc_new(zone, size, align, &tempdetail.datainfo); if (!CONFIG_MALLOC_UPPERMEMORY && !data && zone == &ZoneLow) data = zonelow_expand(size, align, &tempdetail.datainfo); + if (!data && zone == &ZoneHigh) + data = zonehigh_expand(size, align, &tempdetail.datainfo); if (!data) return 0;
This reverts commit a638acfa4cc772b42093c8bfe55669829a641293.
Not needed any more now that ZoneHigh is expanded on demand.
Signed-off-by: Gerd Hoffmann kraxel@redhat.com --- src/pmm.c | 13 ------------- 1 file changed, 13 deletions(-)
diff --git a/src/pmm.c b/src/pmm.c index 28b253b2d33c..640341472486 100644 --- a/src/pmm.c +++ b/src/pmm.c @@ -8,7 +8,6 @@ #include "config.h" // CONFIG_* #include "malloc.h" // _malloc #include "output.h" // dprintf -#include "e820map.h" // struct e820entry #include "std/pmm.h" // PMM_SIGNATURE #include "string.h" // checksum #include "util.h" // pmm_init @@ -76,18 +75,6 @@ handle_pmm00(u16 *args) break; case 2: data = malloc_palloc(highzone, size, align); - if (!data && (flags & 8)) { - /* - * We are out of meory. So go allocate from the (big) - * ZoneTmpHigh instead and reserve the block in the e820 - * map so the OS will not override it. That way we can - * handle big permanent allocations without needing a big - * ZoneHigh. - */ - data = malloc_palloc(&ZoneTmpHigh, size, align); - if (data) - e820_add(data, size, E820_RESERVED); - } break; case 3: { data = malloc_palloc(lowzone, size, align);
On Thu, 21 Apr 2022 11:33:24 +0200 Gerd Hoffmann kraxel@redhat.com wrote:
When running out of memory get a chunk of memory from ZoneTmpHigh to
expand ZoneHigh. Drop simliar logic fro pmm code because it's not
needed ay more.
This fixes some scalability problems, for example with lots of vcpus,
where seabios runs out of memory due to large smbios/acpi tables.
lots of vcpus is a bit vague, it would nice to have a reproducer CLI mentioned here or event better in a commit message.
Gerd Hoffmann (2):
malloc: add on demand ZoneHigh expansion support
Revert "pmm: use tmp zone on oom"
src/malloc.c | 16 ++++++++++++++++
src/pmm.c | 13 -------------
2 files changed, 16 insertions(+), 13 deletions(-)
On Thu, Apr 21, 2022 at 11:33:24AM +0200, Gerd Hoffmann wrote:
When running out of memory get a chunk of memory from ZoneTmpHigh to expand ZoneHigh. Drop simliar logic fro pmm code because it's not needed ay more.
This fixes some scalability problems, for example with lots of vcpus, where seabios runs out of memory due to large smbios/acpi tables.
I'm not sure this is a good idea, because it could cause subtle issues with reproducibility.
SeaBIOS does not have a deterministic ordering to memory allocations because of its implementation of "threads" (coroutines). Should permanent allocations need to spill over to ZoneTmpHigh then it will likely result in a fragmented e820 memory map. In that case, there is a good chance that different bootups will have different e820 maps, which may result in different OS behavior.
The goal of ZoneHigh was to be the maximum amount of space needed. Unused space gets returned to the e820 map before boot, so there is generally not much harm in increasing it. Order of allocations in the ZoneHigh region is less important because we generally don't free allocations in that zone.
IIRC, the pmm ZoneTmpHigh hack was primarily intended for ridiculously large allocations (like framebuffers) where allocating from the e820 map was the only feasible solution.
What's using the ZoneHigh region that is so large that we need to expand it?
-Kevin
Hi,
Unused space gets returned to the e820 map before boot, so there is generally not much harm in increasing it.
Ah, missed that detail ...
So simply bumping it to 1M or so is fine and does not waste memory?
What's using the ZoneHigh region that is so large that we need to expand it?
zonehigh is 256k. Largest allocation usually are the acpi tables (128k for a typical q35 config) and that is the one which typically fails when the other allocations sum up to > 128k so there is less than 128k free space left.
Lots of vcpus (~800 IIRC) leading to large smbios tables are one way to trigger this.
I've also seen allocation failures with many disk devices, although I'm not sure whenever that was zonehigh or zonelow. And the change to only initialize bootable disks should help with that one too.
take care, Gerd