Hi,
Porting LinuxBIOS to new motherboards has become easier and easier over the last period of time. There's almost no need for assembler coding anymore, Hypertransport featured systems do a completely automatical setup of their non coherent devices. On K8 systems even the coherent devices get initialized automatically. But there is still one major drawback when it comes to boot an operating system: Passing the information.
No, this is not going to be a discussion about whether this or that table is preferred. The problem is simply that for each motherboard these tables have to be redone over and over: The pirq table, the mp table and the acpi tables.
This leads to hand made tables that often contain errors or have to be adapted with architectural changes that might have consequences wrt the bus numbering for instance.
* pirq tables do need knowledge that is not provided in the config files yet (wiring)
* MP tables contain a static "compatibility part" and have to have entries for devices on the bus and their interrupts. This is very similar to the pirq table. But they also need information on available APICs. These could be provided from the device tree. We know we have an 8131 in there, so we know 2 IOAPICs belong on the list. We also know what busses hang off that 8131, so we can generate most of the interrupt tables.
* ACPI tables need information on the Apics as well. Now the ACPI implementation I wrote a longer while ago is completely static and basically only works for systems with a single IOAPIC and not very well even on those.
Autocreation of those tables should belong to the driver code of each supported device. The information about the 8131 should come from the 8131 code, the information for the 8111 should come from the 8111 code, and so on.
The solution could be to enhance the struct device_operations by an additional member write_tables(device_t dev, table_t id) which can be subsequently called by each of the write_*_tables() functions, adding their part to the table.
This would also allow to extend the generic information provided by the bridges, by adding such functionality to the mainboard specific code, so we won't end up with something that is worse than now in any case.
Roughly thinking, table_t could look like:
enum { MPTABLE_CPUS, MPTABLE_APICS, MPTABLE_BUSSES, MPTABLE_DEVICES, ACPI_APICS, ACPI_COMPLETE_TABLES, } table_t;
and dev::write_tables() would look similar to this:
static void amd8111_write_tables(device_t dev, table_t id) { struct resource *res;
switch (id) { case MPTABLE_BUSSES: smp_write_bus(mc, dev.link.secondary, "PCI "); [...] break; case MPTABLE_APICS: res = find_resource(dev, PCI_BASE_ADDRESS_0); if (!res) break; smp_write_ioapic(mc, last_ioapic()+1, AMD8131_IOAPIC_VERSION, res->base); [...] break; case ACPI_COMPLETE_TABLES: acpi_create_hpet(...); [...] break; case ACPI_APICS: // We're adding an APIC to an ACPI MADT: acpi_create_madt_ioapic(...); [...] break; default: // all unneeded and unknown table types are ignored. break; } }
Since everything is a device in LinuxBIOS, we could create these tables in a nice and ordered manner.
Comments? Flames? Better ideas?
Stefan
It all sounds good except I would really like to try generating s-expressions as well. I am convinced that the binary table thing is going to cause us future trouble, and the reason I am so convinced is that every binary table I've ever seen had troubles not long after it was disseminated as a standard.
ron
* Ronald G. Minnich rminnich@lanl.gov [050117 21:48]:
It all sounds good except I would really like to try generating s-expressions as well. I am convinced that the binary table thing is going to cause us future trouble, and the reason I am so convinced is that every binary table I've ever seen had troubles not long after it was disseminated as a standard.
You might be right. But it seems that it becomes harder and harder to boot a machine without ACPI and mptables. Until we have Linux and other OSes convinced of something better we might have to grin and bear it.
Stefan
On Mon, 17 Jan 2005, Stefan Reinauer wrote:
You might be right. But it seems that it becomes harder and harder to boot a machine without ACPI and mptables. Until we have Linux and other OSes convinced of something better we might have to grin and bear it.
no argument that we have to create those tables, I just don't want the baseline format to be binary, if at all possible. It's very non-portable to non-x86 systems.
ron
* Ronald G. Minnich rminnich@lanl.gov [050118 03:31]:
no argument that we have to create those tables, I just don't want the baseline format to be binary, if at all possible. It's very non-portable to non-x86 systems.
The baseline really is our internal device tree representation. Everything else should be generated from it.
Non-x86 has been kept rather clean from ACPI, MP, pirq. They did not spend the time to invent the same information passing 3 times ;-)
Those platforms rather offer an IEEE 1275-1994 interface (which is binary+callback, all evil combined, but it is well proven)
Stefan
On Tue, 18 Jan 2005, Stefan Reinauer wrote:
Those platforms rather offer an IEEE 1275-1994 interface (which is binary+callback, all evil combined, but it is well proven)
the consensus seems to be that we make the tree contain what we need, and the last step is to generate tables from the tree. It sure makes sense to me.
We also know that we must generate mptable, irq table, and acpi (I just *love* the PC) since they all contain info that is overlapping, somewhat different, and all needed.
For callbacks, we let OpenBIOS handle that.
I'd still like to see the code that does the s-expr generation but I'm on enough other projects right now that my time is too tight. HELP!
ron
On Jan 18, 2005, at 11:47 AM, Ronald G. Minnich wrote:
On Tue, 18 Jan 2005, Stefan Reinauer wrote:
Those platforms rather offer an IEEE 1275-1994 interface (which is binary+callback, all evil combined, but it is well proven)
the consensus seems to be that we make the tree contain what we need, and the last step is to generate tables from the tree. It sure makes sense to me.
We also know that we must generate mptable, irq table, and acpi (I just *love* the PC) since they all contain info that is overlapping, somewhat different, and all needed.
For callbacks, we let OpenBIOS handle that.
I'd still like to see the code that does the s-expr generation but I'm on enough other projects right now that my time is too tight. HELP!
I'd like to hear more about what Stefan had in mind for the 'small set of C functions'. Maybe the simplest way would be to pass the device tree itself to the payload? I guess it wouldn't solve the binary/ascii problem, but it would sure as hell make the code easy.
Greg
On Tue, 18 Jan 2005, Greg Watson wrote:
I'd like to hear more about what Stefan had in mind for the 'small set of C functions'. Maybe the simplest way would be to pass the device tree itself to the payload? I guess it wouldn't solve the binary/ascii problem, but it would sure as hell make the code easy.
no, that will not work, due to the compiler portability issues. The Plan 9 C compiler won't work against GCC structs in any cases where __attribute(xyz) has been used. We have to be careful here -- not all payloads are compiled with gcc.
That's why I favor the s-expression approach. Binary trees are not going to work.
ron
"Ronald G. Minnich" rminnich@lanl.gov writes:
On Tue, 18 Jan 2005, Greg Watson wrote:
I'd like to hear more about what Stefan had in mind for the 'small set of C functions'. Maybe the simplest way would be to pass the device tree itself to the payload? I guess it wouldn't solve the binary/ascii problem, but it would sure as hell make the code easy.
no, that will not work, due to the compiler portability issues. The Plan 9 C compiler won't work against GCC structs in any cases where __attribute(xyz) has been used. We have to be careful here -- not all payloads are compiled with gcc.
That's why I favor the s-expression approach. Binary trees are not going to work.
Taking this one step farther I am not at all convinced that we want to even export a tree. The simplest representation is actually a graph of the connections between hardware devices. This requires one list of hardware devices and a second list of connections.
If we don't export what is physically possible some creative hardware designer will gang up on us in the future. And I'm not certain we don't need that to properly represent irqs in any event.
Eric
On Tue, 18 Jan 2005, Eric W. Biederman wrote:
Taking this one step farther I am not at all convinced that we want to even export a tree. The simplest representation is actually a graph of the connections between hardware devices. This requires one list of hardware devices and a second list of connections.
works for me.
ron
* Ronald G. Minnich rminnich@lanl.gov [050118 20:05]:
I'd like to hear more about what Stefan had in mind for the 'small set of C functions'. Maybe the simplest way would be to pass the device tree itself to the payload? I guess it wouldn't solve the binary/ascii problem, but it would sure as hell make the code easy.
no, that will not work, due to the compiler portability issues. The Plan 9 C compiler won't work against GCC structs in any cases where __attribute(xyz) has been used. We have to be careful here -- not all payloads are compiled with gcc.
That's why I favor the s-expression approach. Binary trees are not going to work.
What I meant is: There should be a library that people can use that parses s-expressions or whatever is used in the end and work on this information. So you can do foo=find-lbtable("memorymap"); Any payload will want a set of functions like this that can just be compiled and linked. It is not about copying binary data from one edge to another, it is about not having every LinuxBIOS application developer looking for his favourite s-expression library and starting to look for tags and formats. Using a very simple parser s-expressions or xml is perfectly fine for exchanging data. It won't have to do a lot of syntax or semantics checking either since we can probably rely on the fact that the table in memory was produced by another piece of code that has no form errors.
Stefan
Stefan Reinauer stepan@openbios.org writes:
- Ronald G. Minnich rminnich@lanl.gov [050118 20:05]:
I'd like to hear more about what Stefan had in mind for the 'small set of C functions'. Maybe the simplest way would be to pass the device tree itself
to
the payload? I guess it wouldn't solve the binary/ascii problem, but it
would
sure as hell make the code easy.
no, that will not work, due to the compiler portability issues. The Plan 9 C compiler won't work against GCC structs in any cases where __attribute(xyz) has been used. We have to be careful here -- not all payloads are compiled with gcc.
That's why I favor the s-expression approach. Binary trees are not going to work.
What I meant is: There should be a library that people can use that parses s-expressions or whatever is used in the end and work on this information. So you can do foo=find-lbtable("memorymap"); Any payload will want a set of functions like this that can just be compiled and linked. It is not about copying binary data from one edge to another, it is about not having every LinuxBIOS application developer looking for his favourite s-expression library and starting to look for tags and formats. Using a very simple parser s-expressions or xml is perfectly fine for exchanging data. It won't have to do a lot of syntax or semantics checking either since we can probably rely on the fact that the table in memory was produced by another piece of code that has no form errors.
A reference implementation sounds sane. We can't assume that everyone will be using the same code or the latest code but having something to compare with sounds good. Except that a library has not been factored out that is roughly what is in freebios/util/lb-dump directory is.
Eric
On Thu, 20 Jan 2005, Stefan Reinauer wrote:
What I meant is: There should be a library that people can use that parses s-expressions or whatever is used in the end and work on this information. So you can do foo=find-lbtable("memorymap");
excellent idea.
ron
* Stefan Reinauer stepan@openbios.org [050115 22:07]:
enum { MPTABLE_CPUS, [..] ACPI_COMPLETE_TABLES, } table_t;
and dev::write_tables() would look similar to this:
static void amd8111_write_tables(device_t dev, table_t id) { struct resource *res;
switch (id) { case MPTABLE_BUSSES: [...] break; case MPTABLE_APICS: [...] break; case ACPI_COMPLETE_TABLES: [...] break; case ACPI_APICS: [...] break; default: // all unneeded and unknown table types are ignored. break; }
}
Thinking about Ron's comments we might want to keep the different table types more seperated.
Thus instead of having dev::devops::write_tables() would it be preferable to have several hooks for
1) linuxbios table 2) ACPI tables 3) MP table 4) pirq table
The more complete 1 and 2 become, the less 3 and 4 will be needed. In a perfect world we could choose to stop calling 2-4 and not compile them in. This is nicer if the table generation is more seperated.
Regarding non-x86 systems: We don't want 2-4 on those anyways since no operating system _expects_ they are there. And we definitely don't want to change that.
Stefan
Stefan Reinauer stepan@openbios.org writes:
Hi,
Porting LinuxBIOS to new motherboards has become easier and easier over the last period of time. There's almost no need for assembler coding anymore, Hypertransport featured systems do a completely automatical setup of their non coherent devices. On K8 systems even the coherent devices get initialized automatically. But there is still one major drawback when it comes to boot an operating system: Passing the information.
No, this is not going to be a discussion about whether this or that table is preferred. The problem is simply that for each motherboard these tables have to be redone over and over: The pirq table, the mp table and the acpi tables.
I agree that there is an issue particularly with respect to interrupts. A lot of this has waited until we have the time to do this properly.
This leads to hand made tables that often contain errors or have to be adapted with architectural changes that might have consequences wrt the bus numbering for instance.
pirq tables do need knowledge that is not provided in the config files yet (wiring)
MP tables contain a static "compatibility part" and have to have entries for devices on the bus and their interrupts. This is very similar to the pirq table. But they also need information on available APICs. These could be provided from the device tree. We know we have an 8131 in there, so we know 2 IOAPICs belong on the list. We also know what busses hang off that 8131, so we can generate most of the interrupt tables.
Yes.
- ACPI tables need information on the Apics as well. Now the ACPI implementation I wrote a longer while ago is completely static and basically only works for systems with a single IOAPIC and not very well even on those.
Autocreation of those tables should belong to the driver code of each supported device.
No. The devices should have no idea about the format of the data we present to the user. We should push all of the information into the device tree so we can derive it from there.
The information about the 8131 should come from the 8131 code, the information for the 8111 should come from the 8111 code, and so on.
Agreed. The information should be associated with the device in the device tree.
The solution could be to enhance the struct device_operations by an additional member write_tables(device_t dev, table_t id) which can be subsequently called by each of the write_*_tables() functions, adding their part to the table.
No. A write_tables method is bad. We need to enhance the dynamic device tree with irq information. And possible with something like pci class so we can recognize devices with well know software programming interfaces.
This would also allow to extend the generic information provided by the bridges, by adding such functionality to the mainboard specific code, so we won't end up with something that is worse than now in any case.
I think allowing additional work to be done at a per port level is a valid critique. I would prefer we leave it until we find an actual need however.
Roughly thinking, table_t could look like:
That is terrible.
Since everything is a device in LinuxBIOS, we could create these tables in a nice and ordered manner.
Comments? Flames? Better ideas?
For a subset of the idea look at how we generate the cpu information and the memory information directly from the LinuxBIOS table already.
We need an internal format for the information that we can consume and control, and enhance. The fact that we are passing on that information is secondary.
For IRQ routing something very like the work done with open firmware is needed. Open firmware actually cannot represent x86 irq routing as there is it cannot handle a the separate descriptions of apic and non-apic modes. But otherwise it should be able to handle everything.
Eric
* Eric W. Biederman ebiederman@lnxi.com [050118 12:51]:
I agree that there is an issue particularly with respect to interrupts. A lot of this has waited until we have the time to do this properly.
I agree. However I also think we are coming close to the point were the existing infrastructure is fine enough to handle distributed table generation sanely.
Autocreation of those tables should belong to the driver code of each supported device.
No. The devices should have no idea about the format of the data we present to the user. We should push all of the information into the device tree so we can derive it from there.
This is certainly true. But it will require some extra layer to be introduced that is completely missing at the moment. Representation of IOAPICs in the device tree are completely missing at the moment for example. We will also need quite some extra information from the config files.
No. A write_tables method is bad. We need to enhance the dynamic device tree with irq information. And possible with something like pci class so we can recognize devices with well know software programming interfaces.
ACPI has things like IRQ override entries in it's MADT. There are quite some other pretty exceptional situations of working around hardware layout and kernel design allowed by the specification. I am not sure how we want to associate information like this with a certain device. Is it part of the "mainboard" device then? I am scared that we have to invent a proper representation of things like this, just to be able to convert it to acpi later on while in the end ACPI tables are simple to produce and on non-x86 the hardware and linux kernel is less broken.
This would also allow to extend the generic information provided by the bridges, by adding such functionality to the mainboard specific code, so we won't end up with something that is worse than now in any case.
I think allowing additional work to be done at a per port level is a valid critique. I would prefer we leave it until we find an actual need however.
This might be now, even though I am probably able to work around by factorizing the ACPI code seperating the table creator functions and the main function calling those. This one would go to the motherboard directory, next to mptable.c and irq_table.c. The problem I have is that I designed the ACPI code originally for the AMD Solo motherboard, and someone updated it to be useful on the Epia. But with every new motherboard there need to be different devices filled into the MADT. Say "then drop ACPI", but Linux is not able to boot this machine properly without ACPI. As sad as it is.
Roughly thinking, table_t could look like:
That is terrible.
:-)))) Which is why I kept myself from implementing it yet.
For a subset of the idea look at how we generate the cpu information and the memory information directly from the LinuxBIOS table already.
Can you give me a quick pointer? I am not sure that i am looking at the right piece of code.
We need an internal format for the information that we can consume and control, and enhance. The fact that we are passing on that information is secondary.
I agree, though plugging a good concept between 2 broken ones might be hard.
For IRQ routing something very like the work done with open firmware is needed. Open firmware actually cannot represent x86 irq routing as there is it cannot handle a the separate descriptions of apic and non-apic modes. But otherwise it should be able to handle everything.
How does it handle APIC modes? If there is an APIC, I don't see any need to go non-APIC except for academical interest.
Stefan
Stefan Reinauer stepan@openbios.org writes:
- Eric W. Biederman ebiederman@lnxi.com [050118 12:51]:
I agree that there is an issue particularly with respect to interrupts. A lot of this has waited until we have the time to do this properly.
I agree. However I also think we are coming close to the point were the existing infrastructure is fine enough to handle distributed table generation sanely.
Agreed.
Autocreation of those tables should belong to the driver code of each supported device.
No. The devices should have no idea about the format of the data we present to the user. We should push all of the information into the device tree so we can derive it from there.
This is certainly true. But it will require some extra layer to be introduced that is completely missing at the moment. Representation of IOAPICs in the device tree are completely missing at the moment for example. We will also need quite some extra information from the config files.
A small amount or we just get the devices to populate the IOAPICs. When the IOAPICs happen to be a pci device we already represent them. When the path to an IOAPIC is something else we don't. Devices with well known programming interfaces need a little extra care so we can report their well known resources properly.
No. A write_tables method is bad. We need to enhance the dynamic device tree with irq information. And possible with something like pci class so we can recognize devices with well know software programming interfaces.
ACPI has things like IRQ override entries in it's MADT. There are quite some other pretty exceptional situations of working around hardware layout and kernel design allowed by the specification. I am not sure how we want to associate information like this with a certain device. Is it part of the "mainboard" device then? I am scared that we have to invent a proper representation of things like this, just to be able to convert it to acpi later on while in the end ACPI tables are simple to produce and on non-x86 the hardware and linux kernel is less broken.
Maybe. Let's handle the sane cases and see where that puts us. With respect to irq handling we need enough information to route the interrupts before Linux loads. With the pirq and other interrupt routing tables simply standing in the background.
This would also allow to extend the generic information provided by the bridges, by adding such functionality to the mainboard specific code, so we won't end up with something that is worse than now in any case.
I think allowing additional work to be done at a per port level is a valid critique. I would prefer we leave it until we find an actual need however.
This might be now, even though I am probably able to work around by factorizing the ACPI code seperating the table creator functions and the main function calling those. This one would go to the motherboard directory, next to mptable.c and irq_table.c. The problem I have is that I designed the ACPI code originally for the AMD Solo motherboard, and someone updated it to be useful on the Epia. But with every new motherboard there need to be different devices filled into the MADT. Say "then drop ACPI", but Linux is not able to boot this machine properly without ACPI. As sad as it is.
Hmm. I had not seen that problem. But the SOLO was always a weird board. Is this because linux has/had a bug in it's pirq table parser and did not recognize the 8111? Is the problem that we did not get the fix pushed upstream?
Roughly thinking, table_t could look like:
That is terrible.
:-)))) Which is why I kept myself from implementing it yet.
For a subset of the idea look at how we generate the cpu information and the memory information directly from the LinuxBIOS table already.
Can you give me a quick pointer? I am not sure that i am looking at the right piece of code.
Sorry I don't have a good example handy. I just know that for some parts of the process we already do use the existing tables. It is not very sophisticated.
A good starting exercise would be to get the apics and ioapics into the device tree (which requires not extensions) and simply hard code the interrupt source information. That should generate about half of the interrupt routing table.
We need an internal format for the information that we can consume and control, and enhance. The fact that we are passing on that information is secondary.
I agree, though plugging a good concept between 2 broken ones might be hard.
True. But one small step at a time. If we are lucky we won't have to support the really broken ones. And if not we will have enough momentum to carry us through.
For IRQ routing something very like the work done with open firmware is needed. Open firmware actually cannot represent x86 irq routing as there is it cannot handle a the separate descriptions of apic and non-apic modes. But otherwise it should be able to handle everything.
How does it handle APIC modes? If there is an APIC, I don't see any need to go non-APIC except for academical interest.
Open firmware reports interrupts as a property of a bus. And then those bus lines are hooked somewhere.
So there are two mapping steps. Device to bus interrupt line. Bus interrupt line to interrupt sink. And if we can have multiple interrupt sinks per bus we may be able to do model everything trivially.
And of course there is a simple default that the bus interrupt lines connect up to their corresponding bus interrupts lines in the parent bus.
Consuming that and programming the appropriate interrupt routers would also be interesting.
Eric
On Jan 18, 2005, at 4:51 AM, Eric W. Biederman wrote:
Stefan Reinauer stepan@openbios.org writes:
- ACPI tables need information on the Apics as well. Now the ACPI implementation I wrote a longer while ago is completely static and basically only works for systems with a single IOAPIC and not very well even on those.
Autocreation of those tables should belong to the driver code of each supported device.
No. The devices should have no idea about the format of the data we present to the user. We should push all of the information into the device tree so we can derive it from there.
I agree. The right way to do this is to provide the device tree with a single method for serializing itself. If ACPI, MP, pirq information needs to be passed, then it should be included in the device tree. We may need to add additional device tree information to deal with the IOAPICs problem.
The only issue really is what format to use for serialization. I'm leaning towards s-expressions for use with openbios. However, it's conceivable that different serialzation methods could be provided for different payloads, though probably not desirable.
Greg
* Greg Watson gwatson@lanl.gov [050118 15:56]:
The only issue really is what format to use for serialization. I'm leaning towards s-expressions for use with openbios. However, it's conceivable that different serialzation methods could be provided for different payloads, though probably not desirable.
The easiest solution would be to provide a small set of C functions that can be linked to any payload that allow generic access to the device tree information. This could go as a small static library to utils/
Stefan
Greg Watson gwatson@lanl.gov writes:
On Jan 18, 2005, at 4:51 AM, Eric W. Biederman wrote:
The only issue really is what format to use for serialization. I'm leaning towards s-expressions for use with openbios. However, it's conceivable that different serialzation methods could be provided for different payloads, though probably not desirable.
Please note this problem has to pieces. Which information should we represent and how should we represent it.
Devices on a motherboard are not necessarily connect as a tree. I have never seen a tree structured schematic :) So the actual layout of the data is to some extent secondary to the data structures we will use to represent that data.
So we need to drill down on that part as to how we represent logical devices and how we represent the logical connections between them.
Eric
On Jan 18, 2005, at 1:19 PM, Eric W. Biederman wrote:
Greg Watson gwatson@lanl.gov writes:
On Jan 18, 2005, at 4:51 AM, Eric W. Biederman wrote:
The only issue really is what format to use for serialization. I'm leaning towards s-expressions for use with openbios. However, it's conceivable that different serialzation methods could be provided for different payloads, though probably not desirable.
Please note this problem has to pieces. Which information should we represent and how should we represent it.
Devices on a motherboard are not necessarily connect as a tree. I have never seen a tree structured schematic :) So the actual layout of the data is to some extent secondary to the data structures we will use to represent that data.
So we need to drill down on that part as to how we represent logical devices and how we represent the logical connections between them.
Eric
An arbitrary graph seems to be adding additional complexity that we don't really need. Do you have an example of where a tree won't actually suffice?
The nice thing about s-expressions is that it deals with both the structure and representation.
Greg
Greg Watson gwatson@lanl.gov writes:
An arbitrary graph seems to be adding additional complexity that we don't really need. Do you have an example of where a tree won't actually suffice?
The way interrupts are hooked up on most every board, I have seen including dec alphas.
When you add to that the problems of designing something that is multiplatform and must continue to use whatever format we settle upon potentially forever we need to be careful.
I admit that to work with an arbitrary graph between devices is a little harder to work with. Largely however this is 5 lines of code little.
The big advantage a tree has is for internal processing because you get a well defined order for operating on it. A graph does not have this property but as most of the structure will be a tree extracting the appropriate trees should not be difficult for a client.
The nice thing about s-expressions is that it deals with both the structure and representation.
None of this prevents use from using s-expressions, it simply means we can't use the most obvious application of s-expressions.
Eric