The attached patch by Peter Jones pjones@redhat.com implements intelligent placement of the ramdisk in memory during boot of an ELF image created by mkelfimage.
mkelfimage prior to this patch defaults to a hardcoded 1MB which doesn't quite work in all circumstances. The kernel as it is decompressed can overwrite part of the ramdisk. Peter says that pretty much all other boot loaders already do this intelligent detection instead of native hard coding.
This patch allows a wrapped Fedora i586 kernel to boot on both KVM and Artec Group's Thincan DBE61C (coreboot). It probably needs more testing in other circumstances.
Thanks, Warren Togami wtogami@redhat.com
On Tue, Jun 10, 2008 at 12:47:38PM -0400, Warren Togami wrote:
The attached patch by Peter Jones pjones@redhat.com implements intelligent placement of the ramdisk in memory during boot of an ELF image created by mkelfimage.
I appreciate this patch!
But can you (or Peter) please describe the algorithm in english?
I don't like a commit message that claims code to be intelligent.. :p
This patch allows a wrapped Fedora i586 kernel to boot on both KVM and Artec Group's Thincan DBE61C (coreboot). It probably needs more testing in other circumstances.
Thanks again for the patch. See below for some comments.
diff -up mkelfImage-2.7/linux-i386/convert_params.c.ramdisk_base mkelfImage-2.7/linux-i386/convert_params.c --- mkelfImage-2.7/linux-i386/convert_params.c.ramdisk_base 2006-03-27 18:44:59.000000000 -0500 +++ mkelfImage-2.7/linux-i386/convert_params.c 2008-05-21 12:55:44.000000000 -0400 @@ -1301,6 +1301,44 @@ static void query_firmware_values(struct
}
+static void relocate_ramdisk(struct param_info *info) +{
- struct e820entry *e820_map;
- struct e820entry *highest = 0;
- unsigned long load_addr;
- int i;
- e820_map = info->real_mode->e820_map;
+#if 0
- printf("initrd_start = 0x%lx\n", info->real_mode->initrd_start);
- printf("real_mode->e820_map_nr: %d\n", info->real_mode->e820_map_nr);
+#endif
I think these debugging messages should either be hooked to some verbose option or simply removed.
- for (i = 0; i < info->real_mode->e820_map_nr; i++) {
if (e820_map[i].type != E820_RAM)
continue;
+#if 0
printf("addr: 0x%lx len: %x\n", e820_map[i].addr, e820_map[i].size);
+#endif
..as should this.
if (highest && e820_map[i].addr < highest->addr)
continue;
if (e820_map[i].size < info->real_mode->initrd_size)
continue;
Find the highest free RAM block in the e820 map which is big enough to hold the initrd..
if (e820_map[i].addr + info->real_mode->initrd_size >= 0x38000000)
continue;
..what is this 3.5MB limit about? Also might this condition be sensitive to an overflow error?
highest = &e820_map[i];
- }
- if (highest == 0)
return;
- load_addr = highest->addr + highest->size;
- load_addr -= info->real_mode->initrd_size;
- load_addr &= ~0xfffUL;
Load the initrd to the top of this RAM block on a 4k boundary.
- memcpy((void *)load_addr, (void *)info->real_mode->initrd_start, info->real_mode->initrd_size);
- printf("relocating ramdisk to 0x%lx\n", load_addr);
- info->real_mode->initrd_start = load_addr;
+} /*
- Debug
- =============================================================================
@@ -1533,6 +1571,10 @@ void *convert_params(unsigned type, void query_firmware_class(&info); query_firmware_values(&info); query_bootloader_values(&info);
if (info.real_mode->initrd_size) {
/* Make sure the initrd is in a relatively safe place. */
relocate_ramdisk(&info);
}
/* Do the hardware setup that linux might forget... */ hardware_setup(&info);
diff -up mkelfImage-2.7/linux-i386/mkelf-linux-i386.c.ramdisk_base mkelfImage-2.7/linux-i386/mkelf-linux-i386.c --- mkelfImage-2.7/linux-i386/mkelf-linux-i386.c.ramdisk_base 2006-03-17 09:08:22.000000000 -0500 +++ mkelfImage-2.7/linux-i386/mkelf-linux-i386.c 2008-05-21 10:47:42.000000000 -0400 @@ -352,6 +352,9 @@ int linux_i386_mkelf(int argc, char **ar */ params->initrd_start = params->initrd_size = 0; if (ramdisk_size) {
while (ramdisk_base <= kernel_size)
ramdisk_base <<= 1;
Does this start with 1MB, then try 2MB, 4MB and so on?
phdr[index].p_paddr = ramdisk_base; phdr[index].p_vaddr = ramdisk_base; phdr[index].p_filesz = ramdisk_size;
Please keep in mind to also include Signed-off-by: lines from all who have touched the patch. Please see http://www.coreboot.org/Development_Guidelines#Sign-off_Procedure
Thanks again for your contribution!
//Peter
Peter Stuge wrote:
On Tue, Jun 10, 2008 at 12:47:38PM -0400, Warren Togami wrote:
The attached patch by Peter Jones pjones@redhat.com implements intelligent placement of the ramdisk in memory during boot of an ELF image created by mkelfimage.
I appreciate this patch!
But can you (or Peter) please describe the algorithm in english?
I don't like a commit message that claims code to be intelligent.. :p
I personally don't understand the algorithm beyond your interpretation.
Peter explained only a few bits below. Lots of this algorithm is simply how other bootloaders have worked for years. This was just a simple way to implement something known to work.
In any case it is certainly better than the previous hard coded value.
I hope others test this patch so we can be certain it doesn't break anyone where it previously worked. It shouldn't.
diff -up mkelfImage-2.7/linux-i386/convert_params.c.ramdisk_base mkelfImage-2.7/linux-i386/convert_params.c --- mkelfImage-2.7/linux-i386/convert_params.c.ramdisk_base 2006-03-27 18:44:59.000000000 -0500 +++ mkelfImage-2.7/linux-i386/convert_params.c 2008-05-21 12:55:44.000000000 -0400 @@ -1301,6 +1301,44 @@ static void query_firmware_values(struct
}
+static void relocate_ramdisk(struct param_info *info) +{
- struct e820entry *e820_map;
- struct e820entry *highest = 0;
- unsigned long load_addr;
- int i;
- e820_map = info->real_mode->e820_map;
+#if 0
- printf("initrd_start = 0x%lx\n", info->real_mode->initrd_start);
- printf("real_mode->e820_map_nr: %d\n", info->real_mode->e820_map_nr);
+#endif
I think these debugging messages should either be hooked to some verbose option or simply removed.
For merging with upstream I would just remove these blocks unless you already have an established way of building with runtime debug messages. Just go ahead and change it as you see fit for merging.
if (e820_map[i].addr + info->real_mode->initrd_size >= 0x38000000)
continue;
..what is this 3.5MB limit about? Also might this condition be sensitive to an overflow error?
Peter isn't sure why 3.5MB specifically. He said it is known to work for many years in other boot loaders. There may have been a historic reason for this particular value but he doesn't know.
diff -up mkelfImage-2.7/linux-i386/mkelf-linux-i386.c.ramdisk_base mkelfImage-2.7/linux-i386/mkelf-linux-i386.c --- mkelfImage-2.7/linux-i386/mkelf-linux-i386.c.ramdisk_base 2006-03-17 09:08:22.000000000 -0500 +++ mkelfImage-2.7/linux-i386/mkelf-linux-i386.c 2008-05-21 10:47:42.000000000 -0400 @@ -352,6 +352,9 @@ int linux_i386_mkelf(int argc, char **ar */ params->initrd_start = params->initrd_size = 0; if (ramdisk_size) {
while (ramdisk_base <= kernel_size)
ramdisk_base <<= 1;
Does this start with 1MB, then try 2MB, 4MB and so on?
According to Peter there this is only a simple way that avoids alignment earlier. While not expressly necessary (you could add), this requires less code, and more importantly, it is known to work.
phdr[index].p_paddr = ramdisk_base; phdr[index].p_vaddr = ramdisk_base; phdr[index].p_filesz = ramdisk_size;
Please keep in mind to also include Signed-off-by: lines from all who have touched the patch. Please see http://www.coreboot.org/Development_Guidelines#Sign-off_Procedure
OK, he agreed. Attached is the same patch with his signing off (but not me since I didn't change it.)
Warren Togami wtogami@redhat.com
I hope others test this patch so we can be certain it doesn't break anyone where it previously worked. It shouldn't.
I was hoping it would fix qemu's problem with a LAB kernel, but it doesn't change the behavior.
if (e820_map[i].addr + info->real_mode->initrd_size >=
0x38000000)
continue;
..what is this 3.5MB limit about? Also might this condition be sensitive to an overflow error?
Peter isn't sure why 3.5MB specifically. He said it is known to work for many years in other boot loaders. There may have been a historic reason for this particular value but he doesn't know.
I read that value as 896MB, for what it's worth.
Thanks, Myles
Myles Watson wrote:
I hope others test this patch so we can be certain it doesn't break anyone where it previously worked. It shouldn't.
I was hoping it would fix qemu's problem with a LAB kernel, but it doesn't change the behavior.
I have no idea what a LAB kernel is, but if your qemu is using the qemu standard BIOS (not coreboot) you could try wraplinux instead of mkelfimage.
http://www.kernel.org/pub/linux/utils/boot/wraplinux/ http://git.etherboot.org/?p=wraplinux.git;a=summary
Warren Togami wtogami@redhat.com
I was hoping it would fix qemu's problem with a LAB kernel, but it
doesn't
change the behavior.
I have no idea what a LAB kernel is, but if your qemu is using the qemu standard BIOS (not coreboot) you could try wraplinux instead of mkelfimage.
Thanks for the pointer. I'm curious why it wouldn't work with coreboot, since it produces an ELF or NBI file.
Thanks, Myles
http://www.kernel.org/pub/linux/utils/boot/wraplinux/ http://git.etherboot.org/?p=wraplinux.git;a=summary
Warren Togami wtogami@redhat.com
-- coreboot mailing list coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Myles Watson wrote:
I was hoping it would fix qemu's problem with a LAB kernel, but it
doesn't
change the behavior.
I have no idea what a LAB kernel is, but if your qemu is using the qemu standard BIOS (not coreboot) you could try wraplinux instead of mkelfimage.
Thanks for the pointer. I'm curious why it wouldn't work with coreboot, since it produces an ELF or NBI file.
https://fedorahosted.org/k12linux/wiki/Notes/Coreboot I wrote some notes here. H Peter Anvin (of kernel fame) wrote wraplinux from scratch because mknbi has such terrible code and has been dead for so long. He was unaware of the existence of mkelfimage at the time.
He said wraplinux relies upon "int 15h, AX=2401h (unless A20 is enabled) and int 15h, AX=E820h (for probing memory)". He indicated that he really dislikes the design of coreboot and will likely never support it.
wraplinux worked for me with qemu's standard BIOS + Etherboot like used in kvm.
Warren Togami wtogami@redhat.com
On Tue, Jun 10, 2008 at 05:58:58PM -0400, Warren Togami wrote:
H Peter Anvin (of kernel fame) wrote wraplinux from scratch because mknbi has such terrible code and has been dead for so long.
Sweet!
He said wraplinux relies upon "int 15h, AX=2401h (unless A20 is enabled) and int 15h, AX=E820h (for probing memory)". He indicated that he really dislikes the design of coreboot and will likely never support it.
I would very much appreciate if you could expand on this.
In general I think more communication involving coreboot and kernel developers is a good thing because the projects can be quite closely connected.
//Peter
Peter Stuge wrote:
On Tue, Jun 10, 2008 at 05:58:58PM -0400, Warren Togami wrote:
H Peter Anvin (of kernel fame) wrote wraplinux from scratch because mknbi has such terrible code and has been dead for so long.
Sweet!
He said wraplinux relies upon "int 15h, AX=2401h (unless A20 is enabled) and int 15h, AX=E820h (for probing memory)". He indicated that he really dislikes the design of coreboot and will likely never support it.
I would very much appreciate if you could expand on this.
In general I think more communication involving coreboot and kernel developers is a good thing because the projects can be quite closely connected.
Yes, and therein lies the problem. They shouldn't have to be.
APIs exist to decouple modules. Use them. I understand there is finally some serious work going on in that vein in the coreboot community; this makes sense.
The "no API" model is idiotic.
-hpa
On Tue, Jun 10, 2008 at 10:27:19PM -0700, H. Peter Anvin wrote:
APIs exist to decouple modules. Use them. I understand there is finally some serious work going on in that vein in the coreboot community; this makes sense.
The "no API" model is idiotic.
You don't feel it is possible/desirable to have all hardware code in the operating system so that boot code can be oneshot?
(of course the boot code still has to hand over state information)
What would the ideal abstraction be then?
//Peter
On 11/06/08 12:37 +0200, Peter Stuge wrote:
On Tue, Jun 10, 2008 at 10:27:19PM -0700, H. Peter Anvin wrote:
APIs exist to decouple modules. Use them. I understand there is finally some serious work going on in that vein in the coreboot community; this makes sense.
The "no API" model is idiotic.
You don't feel it is possible/desirable to have all hardware code in the operating system so that boot code can be oneshot?
(of course the boot code still has to hand over state information)
What would the ideal abstraction be then?
Interestingly enough, this thread is intertwining with the "coreboot BIOSisms" threat, and Peter has gotten his wish.. "Of course, as mere producers we don't care too much, it would be interesting to discuss also with the consumers". Well, here is the owner of the init code for the only operating system we can boot. I think he qualifies as an interested party.
I don't presume to speak for hpa, but I do think I feel his concern. His areas of interest aren't so very much different then ours. With syslinux and the kernel init code, his main goal is to take a decidedly complex architecture and beat it into sanity. And for the most part, it will just work on any given x86 architecture - hpa isn't running around trying Linux on every new platform that is coming out - he trusts that the BIOS will give him the information that he needs.
Now, you can argue that that the API is old and archaic and stupid. Those are legitimate arguments, but they aren't interesting for the guy who doesn't want to have to worry about changing his code every time a BIOS software developer thinks he has something clever.
This is a problem we are already struggling with. Why doesn't gPXE work out of the box? Why wouldn't libpayload + coreinfo work on non coreboot BIOSen (by all accounts it should). The first answer is because we need to add coreboot specific code to gPXE, and the second answer is that we have _only_ coreboot specific code in libpayload. These are probably not optimal solutions.
Now, I don't know how hpa feels about the traditional BIOS API - perhaps he too would be open to a new way of doing things. But even if he is, there are lots of API consumers right behind him (ACPI, mptable, etc, etc) that would need convincing as well. Its a long road to travel, and we may never reach the end successfully. I think what Kevin is doing is exactly the right course of action - we need to play well with others, sooner rather then later and then slowly bring them along with us. That is the only way we can be successful. Storming in and saying "we are clever, support us or die!" is not a great way to win over the world.
I'll leave you with a thought - the first guy who stood up and said "Hey lets overthrow the king" got his head cut off. It takes consensus to make permanent change.
Jordan
Jordan Crouse wrote:
On 11/06/08 12:37 +0200, Peter Stuge wrote:
On Tue, Jun 10, 2008 at 10:27:19PM -0700, H. Peter Anvin wrote:
APIs exist to decouple modules. Use them. I understand there is finally some serious work going on in that vein in the coreboot community; this makes sense.
The "no API" model is idiotic.
You don't feel it is possible/desirable to have all hardware code in the operating system so that boot code can be oneshot?
(of course the boot code still has to hand over state information)
What would the ideal abstraction be then?
Interestingly enough, this thread is intertwining with the "coreboot BIOSisms" threat, and Peter has gotten his wish.. "Of course, as mere producers we don't care too much, it would be interesting to discuss also with the consumers". Well, here is the owner of the init code for the only operating system we can boot. I think he qualifies as an interested party.
I don't presume to speak for hpa, but I do think I feel his concern. His areas of interest aren't so very much different then ours. With syslinux and the kernel init code, his main goal is to take a decidedly complex architecture and beat it into sanity. And for the most part, it will just work on any given x86 architecture - hpa isn't running around trying Linux on every new platform that is coming out - he trusts that the BIOS will give him the information that he needs.
Now, you can argue that that the API is old and archaic and stupid. Those are legitimate arguments, but they aren't interesting for the guy who doesn't want to have to worry about changing his code every time a BIOS software developer thinks he has something clever.
This is a problem we are already struggling with. Why doesn't gPXE work out of the box? Why wouldn't libpayload + coreinfo work on non coreboot BIOSen (by all accounts it should). The first answer is because we need to add coreboot specific code to gPXE, and the second answer is that we have _only_ coreboot specific code in libpayload. These are probably not optimal solutions.
Now, I don't know how hpa feels about the traditional BIOS API - perhaps he too would be open to a new way of doing things. But even if he is, there are lots of API consumers right behind him (ACPI, mptable, etc, etc) that would need convincing as well. Its a long road to travel, and we may never reach the end successfully. I think what Kevin is doing is exactly the right course of action - we need to play well with others, sooner rather then later and then slowly bring them along with us. That is the only way we can be successful. Storming in and saying "we are clever, support us or die!" is not a great way to win over the world.
I'll leave you with a thought - the first guy who stood up and said "Hey lets overthrow the king" got his head cut off. It takes consensus to make permanent change.
Thank you for saying this way more eloquently than I would have (I have gotten a bit hoarse over the years from yelling the same message too many times.) Your description above summarizes the issues really quite nicely.
As far as the BIOS API is concerned, it is archaic, it's ugly, it is ubiquitous, and it works. The latter two means that no matter what happens, I have to support the BIOS API. Anything coreboot does differently means more work (or that I'll simply say "forget about that stupid thing"), *especially* since there is a cost to multiple solutions for anything. As you point out, once you abandon the BIOS model you have a lot of things behind it that need changing, and largely for no good reason.
The fact that legacy BIOS runs in 16-bit mode is almost an advantage, because it has kept it from mushrooming too far out of control. It's a pretty small API and relatively well-behaved, believe it or not. Furthermore, it is quite limited, which has prevented people from abusing it too badly -- for one thing, noone expects legacy BIOS to be fast.
I would absolutely love for Coreboot to get involved as a full-fledged player on the proper BIOS stage, first of all because it would give a full-featured open source firmware solution, and second of all because it might give some leverage when it comes to driving that platform; Intel used to sort of take that role until they got obsessed with EFI.
As far as implementation is concerned - there is no law that says that the legacy BIOS needs to be written in 16-bit code. One could even trap to SMM on a BIOS interrupt and implement the actual service routines in that mode.
-hpa
Peter Stuge wrote:
He said wraplinux relies upon "int 15h, AX=2401h (unless A20 is enabled) and int 15h, AX=E820h (for probing memory)". He indicated that he really dislikes the design of coreboot and will likely never support it.
I would very much appreciate if you could expand on this.
I should point out something here:
wraplinux uses a subset of the BIOS calls used by the Linux setup code itself. If you can boot a Linux kernel from the standard 16-bit entry point, wraplinux should work out of the box.
-hpa
On Tue, 10 Jun 2008 12:47:38 -0400, Warren Togami wtogami@redhat.com wrote:
The attached patch by Peter Jones pjones@redhat.com implements intelligent placement of the ramdisk in memory during boot of an ELF image created by mkelfimage.
mkelfimage prior to this patch defaults to a hardcoded 1MB which doesn't quite work in all circumstances. The kernel as it is decompressed can overwrite part of the ramdisk. Peter says that pretty much all other boot loaders already do this intelligent detection instead of native hard coding.
This patch allows a wrapped Fedora i586 kernel to boot on both KVM and Artec Group's Thincan DBE61C (coreboot). It probably needs more testing in other circumstances.
Cool. Good work!