Hi,
I have been working on an SMM handler for coreboot in the last week, so we can support laptops and mainboards with power management features that are hard or impossible to support in the near future.
The SMM handler I wrote switches to 32bit protected mode and all SMI# handling is written in C code, compiled with normal gcc without the gcc16 hack or similar nasty things. Thus it is just another exception handler, like the ones we have in coreboot already.
One difference is that the SMM handler stays operational once the OS is loaded. The SMM stub living at 0xa8000 currently jumps to the C handler in normal coreboot code in ram, which starts at _RAMBASE. So in order to be able to use the SMM handler, that memory (code, data, bss, and heap; not the stack) needs to stay in place and untouched for the current scenario to work.
Now, memory consumption of the ram part of coreboot is quite tough:
On an example mainboard, I get these values:
code: 102464 data: 53088 bss: 10248 stack: 32768 heap: 32768 ------------------ total: 231336
Usually code starts at 0x4000 (_RAMBASE) and, in the above example, reaches up to 0x3e000, occupying almost 4 64k "segments" (0x3a000/237568 bytes)
My first thought was: we could add that memory as reserved in the coreboot table / e820. coreboot keeps a lot of information in memory already, and expects the OS / payload to take care it is not overwritten:
- mptable - pirq - dsdt, and the other ACPI tables - DMI (required for ACPI on 32bit systems with newer Linux kernels) - i remember some ebda issues in the last few days on this list, too - last but not least of course the coreboot table.
So far, coreboot has been "tricking around" to put these in places that don't hurt and where the OS is going to find them. This worked in a rough-and-ready manner but will fail us in the future, at least make us stay quick'n'dirty fixing after every occurence of a problem.
But to conclude, keeping (parts of) coreboot resident in memory is nothing that SMM would introduce. We have been doing this for many years.
Another alternative to keeping full coreboot around, would be to make the SMM handler self contained. This would mean, the SMM handler could not use coreboot's functions like printk_debug, pci_read_config32, it could not use the device tree, and it would become more complex, because for some information we have to reprobe the hardware, or parse the coreboot table.
In the case of the SMM handler, this would also confine us, because the actual SMI# handling code (written in C) would not be shared between CPUs but has to be duplicated for every CPU core. However, my current approach only keeps a very small amount of code per CPU, that is just enough to enter gcc compiled functions and return from them, cleanly.
One of the questions in my mind is: where should we put the coreboot image, if we want to keep it around?
A little excerpt from coreboot v2:
I know the problem of where to put coreboot has been thought about before, elfboot() relocates coreboot to another place when loading an ELF binary that demands the space where coreboot lives: * coreboot tries to load a segment and finds out, that it is in the way. * coreboot copies itself to a new position * coreboot jumps into the assembler handler in jmp_to_elf_entry at the new position * coreboot tries to start the ELF binary. * If it fails, it overwrites the loaded ELF binary by copying itself back and jumping to the original position.
This is quite an interesting concept, but it also makes clear that the ram portion of coreboot itself ("stage2") can not be relocated freely in memory. Yet.
Since we know how big our RAM is when we copy coreboot to RAM, I suggest that we copy coreboot to the end of memory and run it from there. It is a pretty good assumption that no payload will require that space. During memory map creation, we just reserve 256k at the upper end, and we're good.
Regards, Stefan
On Sun, Jul 27, 2008 at 7:36 AM, Stefan Reinauer stepan@coresystems.de wrote:
Since we know how big our RAM is when we copy coreboot to RAM, I suggest that we copy coreboot to the end of memory and run it from there. It is a pretty good assumption that no payload will require that space. During memory map creation, we just reserve 256k at the upper end, and we're good.
works for me. I like it.
ron
-----Original Message----- From: coreboot-bounces+joe=settoplinux.org@coreboot.org [mailto:coreboot- bounces+joe=settoplinux.org@coreboot.org] On Behalf Of ron minnich Sent: Sunday, July 27, 2008 11:56 AM To: Stefan Reinauer Cc: Coreboot Subject: Re: [coreboot] [RFC] SMM handling and resident coreboot
On Sun, Jul 27, 2008 at 7:36 AM, Stefan Reinauer stepan@coresystems.de wrote:
Since we know how big our RAM is when we copy coreboot to RAM, I suggest that we copy coreboot to the end of memory and run it from there. It is a pretty good assumption that no payload will require that space. During memory map creation, we just reserve 256k at the upper end, and we're
good.
works for me. I like it.
Yes, I like it a lot too :-) -- Thanks, Joseph Smith Set-Top-Linux www.settoplinux.org
On 27.07.2008 17:55, ron minnich wrote:
On Sun, Jul 27, 2008 at 7:36 AM, Stefan Reinauer stepan@coresystems.de wrote:
Since we know how big our RAM is when we copy coreboot to RAM, I suggest that we copy coreboot to the end of memory and run it from there. It is a pretty good assumption that no payload will require that space. During memory map creation, we just reserve 256k at the upper end, and we're good.
works for me. I like it.
AFAICS this means for v3 that we require stage2 to be PIC like initram. I'm not entirely happy about that because of possible toolchain issues (mostly untested path).
Regards, Carl-Daniel
On 28/07/08 08:27 +0200, Carl-Daniel Hailfinger wrote:
On 27.07.2008 17:55, ron minnich wrote:
On Sun, Jul 27, 2008 at 7:36 AM, Stefan Reinauer stepan@coresystems.de wrote:
Since we know how big our RAM is when we copy coreboot to RAM, I suggest that we copy coreboot to the end of memory and run it from there. It is a pretty good assumption that no payload will require that space. During memory map creation, we just reserve 256k at the upper end, and we're good.
works for me. I like it.
AFAICS this means for v3 that we require stage2 to be PIC like initram. I'm not entirely happy about that because of possible toolchain issues (mostly untested path).
That is part of the growing up process. The v3 model needs to be, above all things, flexible. Every time we think we have it stable enough, something new is going to come along and upset that. We will regress many, many times - the important thing is that the design emerges stronger then it was before.
Jordan
Actually v3 has one user. Our Sandia Labs compute miniclusters use the digital logic boards and run v3. I just remembered this the other day. I forgot it because v3 has been so usable on those boards we no longer think about it.
If you think about it, SMI, VSA, ACPI, EFI, and even the old BIOS -- all are there to virtualize resources that in some cases don't even exist, but in other cases are non-standard.I am wondering about stepping back from the problem and going at it with this approach -- that a runtime BIOS is really there to virtualize resourcs. Viewed this way, the answer is somewhat easier. The runtime BIOS can be a hypervisor. The models supported by these many varying systems are viewed as subset functions of a hypervisor.
If this seems crazy, it is actually the approach used in the PS/3. is that good or bad?
Where do we get a hypervisor? From the Xen guys, for one choice.
Just a thought. Rather than merely redo old ideas like SMI and old-style BIOS, we could step out ahead and do something very powerful.
ron
On 28/07/08 08:14 -0700, ron minnich wrote:
Actually v3 has one user. Our Sandia Labs compute miniclusters use the digital logic boards and run v3. I just remembered this the other day. I forgot it because v3 has been so usable on those boards we no longer think about it.
If you think about it, SMI, VSA, ACPI, EFI, and even the old BIOS -- all are there to virtualize resources that in some cases don't even exist, but in other cases are non-standard.I am wondering about stepping back from the problem and going at it with this approach -- that a runtime BIOS is really there to virtualize resourcs. Viewed this way, the answer is somewhat easier. The runtime BIOS can be a hypervisor. The models supported by these many varying systems are viewed as subset functions of a hypervisor.
If this seems crazy, it is actually the approach used in the PS/3. is that good or bad?
Where do we get a hypervisor? From the Xen guys, for one choice.
Just a thought. Rather than merely redo old ideas like SMI and old-style BIOS, we could step out ahead and do something very powerful.
If you really think about it, though, legacy BIOS services isn't so much like a hypervisor, rather a hypervisor is more like a legacy BIOS - it obfusticates the underlying hardware by acting as a go-between for the hardware and the operating systems. What we have today is really DOS on steroids.
I like where you are going, though - thinking of the firmware as a living, breathing entity throughout the life of the system is radically different then where we are now, and its good to consider the consequences. Unfortunately, we don't get much latitude in designing the interfaces that we are given (we still need to play in SMM's sandbox), but we can design the backend to provide the services we need while still going light on resources. I think the first step is to figure out what such a "hypervisor" would look like, and where it would live. Then, we start plugging in common services and see how well it reacts.
Jordan
ron minnich wrote:
If you think about it, SMI, VSA, ACPI, EFI, and even the old BIOS -- all are there to virtualize resources that in some cases don't even exist, but in other cases are non-standard. I am wondering about stepping back from the problem and going at it with this approach -- that a runtime BIOS is really there to virtualize resourcs.
Yes, but that is also what drivers do. They unify the way your high level software talks to the hardware. On top of that, multi tasking pretends to provide several computers, each running a single program.
Viewed this way, the answer is somewhat easier. The runtime BIOS can be a hypervisor. The models supported by these many varying systems are viewed as subset functions of a hypervisor.
You can use some of the features to model a hypervisor, but the fact that a bios consists of drivers that control hardware has nothing directly to do with a hypervisor. Running a hypervisor on top of the hardware initialization is a good approach, but basically what it boils down to is: where to get the drivers.
If this seems crazy, it is actually the approach used in the PS/3. is that good or bad?
What exactly?
Where do we get a hypervisor? From the Xen guys, for one choice.
Just a thought. Rather than merely redo old ideas like SMI and old-style BIOS, we could step out ahead and do something very powerful.
We could probably teach XEN to understand SMI#s occuring and handle their virtual instances accordingly. But what would be the gain?
I fear there is some confusion about what SMI really is.
SMI is not an old style idea. It's simply a mechanism to get an interrupt when for example your battery is low, or a hot plug event has happened.
And the nice thing is: It is a very standard way of doing things. This will work across cpus and chipsets with a moderate amount of changes to make things happen.
Well, you can sure go the way of polling if you say that having an interrupt is an old style idea, or say that you want to implement that interrupt handler in your favourite operating system. Unfortunately this is not the way operating systems handle things these days.
If you are on an Intel chipset and you want to implement ACPI based on the ACPI standard and Linux implementation, you will have to use an SMI handler, there is no way whatsoever around this. Well, except writing native drivers for highly mainboard and wiring specific stuff in Linux, in Plan9, in *BSD, in Windows XP, in Windows Vista, in [fill your favourite OS or payload].
Yes, it steals cycles. Yes, it is not good for hard realtime applications. Just like any other interrupt. Well, obviously if you want to run RT applications on a current desktop or laptop mainboard you have to ensure that your application does not care for power management or anything related to it.
If you want power management features, you will either have to throw a lot of hardware development of the last 10 years overboard and start with a new revolutionary approach, or you will need someone to take care of those events, even if the OS is not capable of taking care.
The good thing is: you're going to have the choice. With other approaches out there (your fav. bios vendor) you end up with an SMI handler even if you boot your system with acpi=off.
I believe at some point we should very well discuss how we can really change the whole picture by stepping back and looking at the big picture and getting something really smart out of it, that noone else did think about before. But, if nothing else, this is a good finger exercise for us as a community to learn how things generally work, and then we can take the next step.
Stefan
If you are on an Intel chipset and you want to implement ACPI based on the ACPI standard and Linux implementation, you will have to use an SMI handler, there is no way whatsoever around this. Well, except writing native drivers for highly mainboard and wiring specific stuff in Linux, in Plan9, in *BSD, in Windows XP, in Windows Vista, in [fill your favourite OS or payload].
Another point on the Intel part, SMM and SMI handlers are also used with Intels vBIOS (VGA Bios) to handle built in drivers for LVDS screens and tv-out chips (they call these modules VBTs and FlexAIMs). I have been doing alot of research in this lately, and it would be great to impliment this into coreboot. That's my 2 cents worth.
Keeping coreboot running once the OS boots?
This is definitely a total change in our philosophy but I am ready for it, I guess, after fighting it for 9 years. There seems to be no escape from runtime support for OSes. .
I'm not really thinking in terms of anything we have now, such as "today's Xen kernel" or whatever. But virtualization as a unifying theme for all the PC kludgery -- BIOS, SMI, EFI, etc. -- is attractive. IBM has used virtualization for 40 years now, to good effect, to insulate OSes from hardware strangeness. Sony will use it to insulate the PS/3 linux from hardware changes -- they are providing a 10-year guarantee that linux will not need changes to run on PS/3.
Right now on PC there are many types of virtualization --- and I really think this is true virtualization: when CPU does an I/O to a keyboard chip that does not exist, and gets a result back, what else do you call it? That is what the USB stacks in commercial BIOSes do.
There is lots of virtual hardware in our PCs nowadays, and the OSes depend on it. My early hope was that we would free Linux from this model. But Linux now depends on the "Steenkin' BIOS" more than it ever did -- you can't boot a K8 in Linux without ACPI there. Linux dependency on the BIOS is increasing over time.
Anyway, just a random idea, I am going to pursue it and, if there are others interested, let me know. I've done a bit of work with virtualization over the last few years and I can't get this idea out of my head.
thanks
ron
On Mon, 28 Jul 2008 15:09:48 -0700, "ron minnich" rminnich@gmail.com wrote:
Keeping coreboot running once the OS boots?
This is definitely a total change in our philosophy but I am ready for it, I guess, after fighting it for 9 years. There seems to be no escape from runtime support for OSes. .
I'm not really thinking in terms of anything we have now, such as "today's Xen kernel" or whatever. But virtualization as a unifying theme for all the PC kludgery -- BIOS, SMI, EFI, etc. -- is attractive. IBM has used virtualization for 40 years now, to good effect, to insulate OSes from hardware strangeness. Sony will use it to insulate the PS/3 linux from hardware changes -- they are providing a 10-year guarantee that linux will not need changes to run on PS/3.
Right now on PC there are many types of virtualization --- and I really think this is true virtualization: when CPU does an I/O to a keyboard chip that does not exist, and gets a result back, what else do you call it? That is what the USB stacks in commercial BIOSes do.
There is lots of virtual hardware in our PCs nowadays, and the OSes depend on it. My early hope was that we would free Linux from this model. But Linux now depends on the "Steenkin' BIOS" more than it ever did -- you can't boot a K8 in Linux without ACPI there. Linux dependency on the BIOS is increasing over time.
Anyway, just a random idea, I am going to pursue it and, if there are others interested, let me know. I've done a bit of work with virtualization over the last few years and I can't get this idea out of my head.
Isn't this is what the AVATT GSoC 2008 project is all about? http://www.coreboot.org/AVATT
On Mon, Jul 28, 2008 at 3:23 PM, Joseph Smith joe@settoplinux.org wrote:
Isn't this is what the AVATT GSoC 2008 project is all about? http://www.coreboot.org/AVATT
funny you should mention that but yes, that was my proposal and the student is doing a great job on it.
ron
On Mon, 28 Jul 2008 15:09:48 -0700, "ron minnich" rminnich@gmail.com wrote:
IBM has used virtualization for 40 years now, to good effect, to insulate OSes from hardware strangeness.
Agreed, I work with IBM's every day :-)
"ron minnich" rminnich@gmail.com writes:
There is lots of virtual hardware in our PCs nowadays, and the OSes depend on it. My early hope was that we would free Linux from this model. But Linux now depends on the "Steenkin' BIOS" more than it ever did -- you can't boot a K8 in Linux without ACPI there. Linux dependency on the BIOS is increasing over time.
Let's just remember that the practical boundary we draw is support for mainboard specific things. We need to do one of a couple of things. - Pass a device tree (or other data structure) to the kernel with all of the little motherboard specific wiring details to get software that understands the hardware to work. - Make using and understanding the hardware irrelevant. - Some level of virtualization.
In general the firmware is the hardest and most dangerous part of the system to update. Plus there is never enough time to develop it. So we need to keep things as simple as we can (for developers at least).
Eric
This thread seems to reinvent VSA.
On Mon, Jul 28, 2008 at 03:09:48PM -0700, ron minnich wrote:
Linux dependency on the BIOS is increasing over time.
The day we accept this is the day we give up on open source, in the sense that we are unable to come up with something superior. Ron, please don't give up.
I agree with Stefan that we have to go through this as a learning exercise. Things will undoubtly get a lot worse before they get better.
//Peter
Peter Stuge wrote:
This thread seems to reinvent VSA.
VSA is one possible implementation of an SMM handler.
Advocatus diaboli: Just like Windows is a possible implementation of a protected mode OS. Does this make protected mode bad per se?
I agree with Stefan that we have to go through this as a learning exercise. Things will undoubtly get a lot worse before they get better.
What exactly makes being able to support laptops so bad in your opinion?
On Tue, Jul 29, 2008 at 04:33:15PM +0200, Stefan Reinauer wrote:
This thread seems to reinvent VSA.
VSA is one possible implementation of an SMM handler.
I was thinking about the plugging and services and so on. I have always enjoyed coreboot being SMM free, and I consider that a huge marketing advantage even with the SMM handler being optional.
Also, if it is easy to add SMM code to coreboot I'm afraid it will become a trend and to me, it is not the right fix for anything.
Advocatus diaboli: Just like Windows is a possible implementation of a protected mode OS. Does this make protected mode bad per se?
I like PM, but not SMM.
My beef with SMM besides virtualizing hardware is the segregation and to some degree duplication of logic between OS and $othercode. Just like we enjoy Linux as bootloader because drivers are only in one place, I want to enjoy the operating system doing everything SMM is used for. You know, operations. Yes, it is revolutionary, at least for PCs.
I agree with Stefan that we have to go through this as a learning exercise. Things will undoubtly get a lot worse before they get better.
What exactly makes being able to support laptops so bad in your opinion?
Not what, but how. Again, it is an inevitable first step, but I don't want to settle down once it is done.
I am afraid that what I consider to be the wrong solution will gather critical mass very quickly because people think it is good enough, and that yet another migration will be too painful. Sorry if I sound too cynical. Blame the kernel guys who show no love. :p
//Peter
On Tue, Jul 29, 2008 at 8:43 AM, Peter Stuge peter@stuge.se wrote:
I was thinking about the plugging and services and so on. I have always enjoyed coreboot being SMM free, and I consider that a huge marketing advantage even with the SMM handler being optional.
Also, if it is easy to add SMM code to coreboot I'm afraid it will become a trend and to me, it is not the right fix for anything.
I'm not really talking SMM. I'm talking about a hypervisor to provide a superset of what SMM and ACPI and all the others provide. This is a research activity and I'll see where it goes. It may go nowhere.
But it will be utterly GPL, and it will either do a far better job, and give us an improved world, or we don't do it.
My beef with SMM besides virtualizing hardware is the segregation and to some degree duplication of logic between OS and $othercode. Just like we enjoy Linux as bootloader because drivers are only in one place, I want to enjoy the operating system doing everything SMM is used for. You know, operations. Yes, it is revolutionary, at least for PCs.
It is, sadly, becoming revolutionary for Linux :-( Linux on Opteron can't really work right without ACPI. And YingHai's last try at fixing that was rejected by Andi Kleen.
But the low level VM layer, done right, can work well: see IBM, who sell billions of dollars worth of power and 390 and Cell systems every year that do this.
Anyway, it's not going to happen any time soon, if at all. I'm just looking at it. But our inability to kill ACPI tells me something. Even OLPC has to do ACPI now. The last valiant charge of OFW is going to come to naught in the PC world, I am afraid. Sometimes, the (sub)standards are impossible to kill.
ron
On Tue, Jul 29, 2008 at 9:27 AM, ron minnich rminnich@gmail.com wrote:
On Tue, Jul 29, 2008 at 8:43 AM, Peter Stuge peter@stuge.se wrote:
I was thinking about the plugging and services and so on. I have always enjoyed coreboot being SMM free, and I consider that a huge marketing advantage even with the SMM handler being optional.
Also, if it is easy to add SMM code to coreboot I'm afraid it will become a trend and to me, it is not the right fix for anything.
I'm not really talking SMM. I'm talking about a hypervisor to provide a superset of what SMM and ACPI and all the others provide. This is a research activity and I'll see where it goes. It may go nowhere.
But it will be utterly GPL, and it will either do a far better job, and give us an improved world, or we don't do it.
My beef with SMM besides virtualizing hardware is the segregation and to some degree duplication of logic between OS and $othercode. Just like we enjoy Linux as bootloader because drivers are only in one place, I want to enjoy the operating system doing everything SMM is used for. You know, operations. Yes, it is revolutionary, at least for PCs.
It is, sadly, becoming revolutionary for Linux :-( Linux on Opteron can't really work right without ACPI. And YingHai's last try at fixing that was rejected by Andi Kleen.
Now those patches is in v2.6.26. thanks Ingo also there is one update_mptable option, that could be used to convert acpi routing to mptable... -- in 2.6.27-rc1.
please keep in mind, SMI is evil.
YH
On Tue, Jul 29, 2008 at 1:00 PM, yhlu yinghailu@gmail.com wrote:
Now those patches is in v2.6.26. thanks Ingo
Great!
also there is one update_mptable option, that could be used to convert acpi routing to mptable... -- in 2.6.27-rc1.
please keep in mind, SMI is evil.
I have no argument. I just don't know how we're ever going to get rid of it. We'll see.
ron
On 29/07/08 13:00 -0700, yhlu wrote:
please keep in mind, SMI is evil.
This is the completely wrong attitude to have. SMI is not evil. Saying it is evil is saying that you would prefer to have severely limited functionality to avoid it. *You* might agree, but I don't think thats the message that our IHVs want to hear.
Rather, we should say that SMI is undesirable. We will try to avoid it when at all possible, but we will use it in order to provide the best experience to our customers and when no other alternatives present themselves.
Jordan
On Tue, Jul 29, 2008 at 02:42:23PM -0600, Jordan Crouse wrote:
IHVs
I expect them to be completely oblivious to anything below the OS level. "Do what the BIOS does", "It works on Windows" and so on.
try to avoid it
Why pretend? For business interests, SMM and ACPI are strict and hard requirements because they are currently the single way to accomplish certain things. Anything less is just not good enough. I wouldn't expect the industry to take interest in developing new PC boot architecture. When we have something else that works, there may be interest, but I think we're on our own getting there.
//Peter
Peter Stuge schrieb:
try to avoid it
Why pretend? For business interests, SMM and ACPI are strict and hard requirements because they are currently the single way to accomplish certain things. Anything less is just not good enough. I wouldn't expect the industry to take interest in developing new PC boot architecture. When we have something else that works, there may be interest, but I think we're on our own getting there.
There might be cases where some BIOS vendor considered using SMI as the silver bullet, where other methods might be just as reasonable. No idea, maybe they use SMI from ACPI instead of doing things in ACPI directly because SMI is more suitable to "protect intellectual property" (given that ACPI tables _must_ be publically readable, by the OS). We don't have such design limitations, so we can avoid SMI in cases where the others might consider it.
Patrick Georgi
Hi Ron,
On Mon, Jul 28, 2008 at 08:14:55AM -0700, ron minnich wrote:
If you think about it, SMI, VSA, ACPI, EFI, and even the old BIOS -- all are there to virtualize resources that in some cases don't even exist, but in other cases are non-standard.I am wondering about stepping back from the problem and going at it with this approach -- that a runtime BIOS is really there to virtualize resourcs. Viewed this way, the answer is somewhat easier. The runtime BIOS can be a hypervisor. The models supported by these many varying systems are viewed as subset functions of a hypervisor.
You seem to be suggesting that we could create a bios that just always ran its payload in an emulated machine.
I believe vmware has a product that does something similar - it doesn't run at the bios level, but it can use PXE boot (of similar) to launch a hypervisor that takes over the machine and then makes it available for guests to be scheduled on.
The problem I see with this is that a hypervisor can have significant overhead. (One has to task switch to the hypervisor to do IO.) Also, I doubt everyone will agree on a single hypervisor implementation (kvm, vmware, virtualbox, xen, microsoft's vm, etc.).
-Kevin
The problem I see with this is that a hypervisor can have significant overhead. (One has to task switch to the hypervisor to do IO.) Also, I doubt everyone will agree on a single hypervisor implementation (kvm, vmware, virtualbox, xen, microsoft's vm, etc.).
My vote is for virtualbox, it is the sweetest one I have seen yet :-)
On Mon, Jul 28, 2008 at 5:46 PM, Kevin O'Connor kevin@koconnor.net wrote:
You seem to be suggesting that we could create a bios that just always ran its payload in an emulated machine.
it's a question. The PS/3 and XBOX 360 are proofs of concept.
Anyway for me it's a research avenue we may pursue.
The problem I see with this is that a hypervisor can have significant overhead. (One has to task switch to the hypervisor to do IO.) Also, I doubt everyone will agree on a single hypervisor implementation (kvm, vmware, virtualbox, xen, microsoft's vm, etc.).
There is newer hardware such as Opteron SVM and chipsets with MMUIO allow direct hardware access from a guest.
It is not the case that one must ask the hypervisor to do I/O. More and more frequently, esp. in multicore systems, hypervisors are also used to partition the system, not just time share the CPU. Things are changing rapidly in the hypervisor universe.
thanks
ron
On Sun, Jul 27, 2008 at 7:36 AM, Stefan Reinauer stepan@coresystems.de wrote:
Since we know how big our RAM is when we copy coreboot to RAM, I suggest that we copy coreboot to the end of memory and run it from there. It is a pretty good assumption that no payload will require that space. During memory map creation, we just reserve 256k at the upper end, and we're good.
you mean somewhere below 4g? or below TOM.
also for ecc ram, only ram below CONFIG_LB_MEM_TOPK get inited
YH
yhlu wrote:
On Sun, Jul 27, 2008 at 7:36 AM, Stefan Reinauer stepan@coresystems.de wrote:
Since we know how big our RAM is when we copy coreboot to RAM, I suggest that we copy coreboot to the end of memory and run it from there. It is a pretty good assumption that no payload will require that space. During memory map creation, we just reserve 256k at the upper end, and we're good.
you mean somewhere below 4g? or below TOM.
End of RAM. On some chipsets, there also seems to be a mechanism to reserve memory space fore SMM handlers (TSEG)
also for ecc ram, only ram below CONFIG_LB_MEM_TOPK get inited
Does this mean all memory has to be initialized? Or is it enough to initialize only the part we're using?
I thought to remember ECC memory is scrubbed locally by each CPU on startup? Is this not the case?
Stefan
On Sun, Jul 27, 2008 at 5:01 PM, Stefan Reinauer stepan@coresystems.de wrote:
yhlu wrote:
On Sun, Jul 27, 2008 at 7:36 AM, Stefan Reinauer stepan@coresystems.de wrote:
Since we know how big our RAM is when we copy coreboot to RAM, I suggest that we copy coreboot to the end of memory and run it from there. It is a pretty good assumption that no payload will require that space. During memory map creation, we just reserve 256k at the upper end, and we're good.
you mean somewhere below 4g? or below TOM.
End of RAM. On some chipsets, there also seems to be a mechanism to reserve memory space fore SMM handlers (TSEG)
so it is less 4g...like normal BIOS.
also for ecc ram, only ram below CONFIG_LB_MEM_TOPK get inited
Does this mean all memory has to be initialized? Or is it enough to initialize only the part we're using?
I thought to remember ECC memory is scrubbed locally by each CPU on startup? Is this not the case?
do you mean code from hardware_main() , or other new smm code?
YH
yhlu wrote:
End of RAM. On some chipsets, there also seems to be a mechanism to reserve memory space fore SMM handlers (TSEG)
so it is less 4g...like normal BIOS.
eh, yes, right. I guess we would need long mode to go beyond 4G.
also for ecc ram, only ram below CONFIG_LB_MEM_TOPK get inited
Does this mean all memory has to be initialized? Or is it enough to initialize only the part we're using?
I thought to remember ECC memory is scrubbed locally by each CPU on startup? Is this not the case?
do you mean code from hardware_main() , or other new smm code?
I was referring to coreboot-v2/src/cpu/amd/model_fxx/model_fxx_init.c: init_ecc_memory() for example.
Stefan
On 27.07.2008 16:36, Stefan Reinauer wrote:
I have been working on an SMM handler for coreboot in the last week, so we can support laptops and mainboards with power management features that are hard or impossible to support in the near future.
Great!
The SMM handler I wrote switches to 32bit protected mode and all SMI# handling is written in C code, compiled with normal gcc without the gcc16 hack or similar nasty things. Thus it is just another exception handler, like the ones we have in coreboot already.
I like that approach.
One difference is that the SMM handler stays operational once the OS is loaded. The SMM stub living at 0xa8000 currently jumps to the C handler in normal coreboot code in ram, which starts at _RAMBASE. So in order to be able to use the SMM handler, that memory (code, data, bss, and heap; not the stack) needs to stay in place and untouched for the current scenario to work.
Now, memory consumption of the ram part of coreboot is quite tough:
On an example mainboard, I get these values:
code: 102464 data: 53088 bss: 10248 stack: 32768 heap: 32768
total: 231336
Usually code starts at 0x4000 (_RAMBASE) and, in the above example, reaches up to 0x3e000, occupying almost 4 64k "segments" (0x3a000/237568 bytes)
My first thought was: we could add that memory as reserved in the coreboot table / e820. coreboot keeps a lot of information in memory already, and expects the OS / payload to take care it is not overwritten:
- mptable
- pirq
- dsdt, and the other ACPI tables
- DMI (required for ACPI on 32bit systems with newer Linux kernels)
- i remember some ebda issues in the last few days on this list, too
- last but not least of course the coreboot table.
So far, coreboot has been "tricking around" to put these in places that don't hurt and where the OS is going to find them. This worked in a rough-and-ready manner but will fail us in the future, at least make us stay quick'n'dirty fixing after every occurence of a problem.
But to conclude, keeping (parts of) coreboot resident in memory is nothing that SMM would introduce. We have been doing this for many years.
Another alternative to keeping full coreboot around, would be to make the SMM handler self contained. This would mean, the SMM handler could not use coreboot's functions like printk_debug, pci_read_config32, it could not use the device tree, and it would become more complex, because for some information we have to reprobe the hardware, or parse the coreboot table.
From a v3 perspective, it might make sense to keep the SMI handler (if
it has limited size) in the boot block. That avoids any and all requirements to keep code in RAM, thereby taking away the problem of code relocation.
In the case of the SMM handler, this would also confine us, because the actual SMI# handling code (written in C) would not be shared between CPUs but has to be duplicated for every CPU core. However, my current approach only keeps a very small amount of code per CPU, that is just enough to enter gcc compiled functions and return from them, cleanly.
AFAIK factory BIOS SMM handlers have the ability to lock down the memory segment they are using, protecting them from accidental or deliberate tampering by the OS (which could lead to interesting security issues). Can we do that even if the handler is "somewhere" in RAM?
One of the questions in my mind is: where should we put the coreboot image, if we want to keep it around?
A little excerpt from coreboot v2:
I know the problem of where to put coreboot has been thought about before, elfboot() relocates coreboot to another place when loading an ELF binary that demands the space where coreboot lives:
- coreboot tries to load a segment and finds out, that it is in the way.
- coreboot copies itself to a new position
- coreboot jumps into the assembler handler in jmp_to_elf_entry at the
new position
- coreboot tries to start the ELF binary.
- If it fails, it overwrites the loaded ELF binary by copying itself
back and jumping to the original position.
This is quite an interesting concept, but it also makes clear that the ram portion of coreboot itself ("stage2") can not be relocated freely in memory. Yet.
Please see my other mail in this thread about possible problems with a relocatable stage2. Besides that, we'd need a way in v3 to tag a LAR member as PIC (sort of done for the special case of XIP in initram) and new code to figure out a good load address during run time.
Since we know how big our RAM is when we copy coreboot to RAM, I suggest that we copy coreboot to the end of memory and run it from there. It is a pretty good assumption that no payload will require that space. During memory map creation, we just reserve 256k at the upper end, and we're good.
Hm... I assume end of memory is "end of memory below 4G". If we want to avoid conflicts with mapped memory areas of extension cards, stage2 code has to be loaded twice: Once before setting up extension cards (load stage2 to above 1M or other failsafe location) and after setting up extension cards (load stage2 to end of memory below 4G and below extension card space). That's not exactly the nicest code flow I can think of.
Regards, Carl-Daniel
Carl-Daniel Hailfinger wrote:
Another alternative to keeping full coreboot around, would be to make the SMM handler self contained. This would mean, the SMM handler could not use coreboot's functions like printk_debug, pci_read_config32, it could not use the device tree, and it would become more complex, because for some information we have to reprobe the hardware, or parse the coreboot table.
From a v3 perspective, it might make sense to keep the SMI handler (if it has limited size) in the boot block. That avoids any and all requirements to keep code in RAM, thereby taking away the problem of code relocation.
I thought about this, too, but unfortunately it won't work. Two problems:
* This still does not solve the problem for ACPI, and the coreboot table. That they survive is currently pure luck, and this is version agnostic, and starts causing trouble as soon as we think about scenarios like cleanly incorporating seabios. * There is no heap nor data nor bss available, so we keep all disadvantages of keeping SMI self contained, except code duplication of maybe printk_debug() and pci_[read|write]_config[8|16|32] and/or possibly their enhanced PCIe versions. Considering that in a productive system printk falls away, making the SMM handler self contained is a considerable option.
In the case of the SMM handler, this would also confine us, because the actual SMI# handling code (written in C) would not be shared between CPUs but has to be duplicated for every CPU core. However, my current approach only keeps a very small amount of code per CPU, that is just enough to enter gcc compiled functions and return from them, cleanly.
AFAIK factory BIOS SMM handlers have the ability to lock down the memory segment they are using, protecting them from accidental or deliberate tampering by the OS (which could lead to interesting security issues). Can we do that even if the handler is "somewhere" in RAM?
Not really. There are certain memory regions that can be locked. But the smi_handler() C function does not live there, currently. If we want to lock out the OS, we need to go down the route of being self-contained.
One of the questions in my mind is: where should we put the coreboot image, if we want to keep it around?
A little excerpt from coreboot v2:
I know the problem of where to put coreboot has been thought about before, elfboot() relocates coreboot to another place when loading an ELF binary that demands the space where coreboot lives:
- coreboot tries to load a segment and finds out, that it is in the way.
- coreboot copies itself to a new position
- coreboot jumps into the assembler handler in jmp_to_elf_entry at the
new position
- coreboot tries to start the ELF binary.
- If it fails, it overwrites the loaded ELF binary by copying itself
back and jumping to the original position.
This is quite an interesting concept, but it also makes clear that the ram portion of coreboot itself ("stage2") can not be relocated freely in memory. Yet.
Please see my other mail in this thread about possible problems with a relocatable stage2. Besides that, we'd need a way in v3 to tag a LAR member as PIC (sort of done for the special case of XIP in initram) and new code to figure out a good load address during run time.
So you are saying we can't make stage2 relocatable because noone else did before, or because we would have to write new code? I think both are acceptable risks given that v3 has a user base of exactly zero and is not in the shape to carry a port to any non-embedded systems anyways.
Since we know how big our RAM is when we copy coreboot to RAM, I suggest that we copy coreboot to the end of memory and run it from there. It is a pretty good assumption that no payload will require that space. During memory map creation, we just reserve 256k at the upper end, and we're good.
Hm... I assume end of memory is "end of memory below 4G". If we want to avoid conflicts with mapped memory areas of extension cards, stage2 code has to be loaded twice: Once before setting up extension cards (load stage2 to above 1M or other failsafe location) and after setting up extension cards (load stage2 to end of memory below 4G and below extension card space). That's not exactly the nicest code flow I can think of.
Yes, I guess I was referring to anything that fits in 32bit address space, as there is no coreboot port utilizing 64bit addressing modes. End of memory is end of _memory_, not end of address space. The memory hole for PCI devices is hard coded in coreboot, and most chipsets allow to remap memory "under" the hole to above 4G.
Stefan
On 28.07.2008 14:11, Stefan Reinauer wrote:
Carl-Daniel Hailfinger wrote:
Another alternative to keeping full coreboot around, would be to make the SMM handler self contained. This would mean, the SMM handler could not use coreboot's functions like printk_debug, pci_read_config32, it could not use the device tree, and it would become more complex, because for some information we have to reprobe the hardware, or parse the coreboot table.
From a v3 perspective, it might make sense to keep the SMI handler (if it has limited size) in the boot block. That avoids any and all requirements to keep code in RAM, thereby taking away the problem of code relocation.
I thought about this, too, but unfortunately it won't work. Two problems:
- This still does not solve the problem for ACPI, and the coreboot
table. That they survive is currently pure luck, and this is version agnostic, and starts causing trouble as soon as we think about scenarios like cleanly incorporating seabios.
Well, that problem should be solvable with e820 maps or coreboot tables (and I thought it already is).
- There is no heap nor data nor bss available, so we keep all
disadvantages of keeping SMI self contained, except code duplication of maybe printk_debug() and pci_[read|write]_config[8|16|32] and/or possibly their enhanced PCIe versions. Considering that in a productive system printk falls away, making the SMM handler self contained is a considerable option.
Fair enough. So do we want a separate stage_SMM or should it be part of stage2? The completely standalone SMM solution is also possible, but I guess that choice only makes sense if the stuff to be done in SMM mode is mostly independent of anything in the bootblock.
In the case of the SMM handler, this would also confine us, because the actual SMI# handling code (written in C) would not be shared between CPUs but has to be duplicated for every CPU core. However, my current approach only keeps a very small amount of code per CPU, that is just enough to enter gcc compiled functions and return from them, cleanly.
AFAIK factory BIOS SMM handlers have the ability to lock down the memory segment they are using, protecting them from accidental or deliberate tampering by the OS (which could lead to interesting security issues). Can we do that even if the handler is "somewhere" in RAM?
Not really. There are certain memory regions that can be locked. But the smi_handler() C function does not live there, currently. If we want to lock out the OS, we need to go down the route of being self-contained.
I believe that is the one sticking point which forces us to use the self-contained variant.
One of the questions in my mind is: where should we put the coreboot image, if we want to keep it around?
A little excerpt from coreboot v2:
I know the problem of where to put coreboot has been thought about before, elfboot() relocates coreboot to another place when loading an ELF binary that demands the space where coreboot lives:
- coreboot tries to load a segment and finds out, that it is in the way.
- coreboot copies itself to a new position
- coreboot jumps into the assembler handler in jmp_to_elf_entry at the
new position
- coreboot tries to start the ELF binary.
- If it fails, it overwrites the loaded ELF binary by copying itself
back and jumping to the original position.
This is quite an interesting concept, but it also makes clear that the ram portion of coreboot itself ("stage2") can not be relocated freely in memory. Yet.
Please see my other mail in this thread about possible problems with a relocatable stage2. Besides that, we'd need a way in v3 to tag a LAR member as PIC (sort of done for the special case of XIP in initram) and new code to figure out a good load address during run time.
So you are saying we can't make stage2 relocatable because noone else did before, or because we would have to write new code? I think both are acceptable risks given that v3 has a user base of exactly zero and is not in the shape to carry a port to any non-embedded systems anyways.
Actually, there are two problems^W challenges with relocatable code in v3: - Calls from relocatable code (initram etc.) to other chunks of code like (bootblock). GCC has no mode to emit calls the way we want and future gcc versions or stronger optimizations will possibly make our current indirect calls unworkable. I have a mostly finished, Segher-approved solution for that problem on my disk. (Needs to be cleaned up before sending.) - PIC data segment funkiness. IIRC Segher once said that the way we compile and link initram works only by accident and because we don't have any (global) variables outside the stack. It had something to do with data segment locations being treated as relative to code locations. I'm fuzzy on the details, though.
Since we know how big our RAM is when we copy coreboot to RAM, I suggest that we copy coreboot to the end of memory and run it from there. It is a pretty good assumption that no payload will require that space. During memory map creation, we just reserve 256k at the upper end, and we're good.
Hm... I assume end of memory is "end of memory below 4G". If we want to avoid conflicts with mapped memory areas of extension cards, stage2 code has to be loaded twice: Once before setting up extension cards (load stage2 to above 1M or other failsafe location) and after setting up extension cards (load stage2 to end of memory below 4G and below extension card space). That's not exactly the nicest code flow I can think of.
Yes, I guess I was referring to anything that fits in 32bit address space, as there is no coreboot port utilizing 64bit addressing modes. End of memory is end of _memory_, not end of address space. The memory hole for PCI devices is hard coded in coreboot, and most chipsets allow to remap memory "under" the hole to above 4G.
The only thing I've been able to find in v2 and v3 about memory holes is that they are hardcoded to 1 GB size for GeodeLX and build-time configurable for K8 with a default of 1GB size. That's pretty interesting if someone has 2 graphics cards with 512MB each and another extension card with a few kB. It would be nice if you could point me to other information sources I may have overlooked.
Regards, Carl-Daniel
Stefan Reinauer wrote:
Hi,
...
Another alternative to keeping full coreboot around, would be to make the SMM handler self contained. This would mean, the SMM handler could not use coreboot's functions like printk_debug, pci_read_config32, it could not use the device tree, and it would become more complex, because for some information we have to reprobe the hardware, or parse the coreboot table.
I think you need to consider keeping the runtime smm code self-contained. Executing code from SMM that is outside the SMM protected memory region could be risky. prink is a conveniet way to debug SMM but it shouldn't be used runtime in a normal system since Linux could be using the serial port. PCI read/write functions are not so duplication shouldn't be too bad. The device tree might be interesting but I don't know that SMM really needs it.
In the case of the SMM handler, this would also confine us, because the actual SMI# handling code (written in C) would not be shared between CPUs but has to be duplicated for every CPU core. However, my current approach only keeps a very small amount of code per CPU, that is just enough to enter gcc compiled functions and return from them, cleanly.
I didn't really understand this comment. You are saying that there would be generic routines for entering SMM, PCIr/w etc and that there is a small section of processor specific code? I think that the processors and platform specific stuff will be the majority of the code not the other way around. It would be similar to coreboot in that way.
One of the questions in my mind is: where should we put the coreboot image, if we want to keep it around?
...
Since we know how big our RAM is when we copy coreboot to RAM, I suggest that we copy coreboot to the end of memory and run it from there. It is a pretty good assumption that no payload will require that space. During memory map creation, we just reserve 256k at the upper end, and we're good.
The end of memory is an ok place but you could also consider the legacy locations at 0xE0000-0xFFFFF since Linux and other OS tend to avoid that area and look for ACPI and other table (or pointers to table) there.
Marc
On 28.07.2008 21:22, Marc Jones wrote:
Stefan Reinauer wrote:
Another alternative to keeping full coreboot around, would be to make the SMM handler self contained. This would mean, the SMM handler could not use coreboot's functions like printk_debug, pci_read_config32, it could not use the device tree, and it would become more complex, because for some information we have to reprobe the hardware, or parse the coreboot table.
I think you need to consider keeping the runtime smm code self-contained. Executing code from SMM that is outside the SMM protected memory region could be risky. printk is a convenient way to debug SMM but it shouldn't be used runtime in a normal system since Linux could be using the serial port.
That's why we have the ability to tell printk to log to a memory buffer only.
PCI read/write functions are not so duplication shouldn't be too bad.
PCI config read/write is probably the one thing we never want to do in SMM because the OS could be in the middle of an CF8/CFC cycle.
Regards, Carl-Daniel
Carl-Daniel Hailfinger wrote:
PCI read/write functions are not so duplication shouldn't be too bad.
PCI config read/write is probably the one thing we never want to do in SMM because the OS could be in the middle of an CF8/CFC cycle.
SMM will have to read config space. No way around it. cf8 just needs to be saved and restored.
Marc