For the longest time I have been fighting what to do with machines that have 4GB or more of memory and large memory mapped I/O resources. As the conflict in memory space can cause a loss of 1/2GB of memory. Which does not tend to be good for performance.
Looking the large memory mapped I/O resources all have 64bit BARs. So I am just going to modify the LinuxBIOS resource allocator to put 64bit BARs on x86-64 boxes at least above the top of memory.
There are a couple of short term hacks that I can do with relocating memory but on the Opteron the more memory I add the more problems I run into. And moving BARS above 4G looks like the right term fix.
Any objections?
Eric
* Eric W. Biederman ebiederman@lnxi.com [040601 08:05]:
Looking the large memory mapped I/O resources all have 64bit BARs. So I am just going to modify the LinuxBIOS resource allocator to put 64bit BARs on x86-64 boxes at least above the top of memory.
There are a couple of short term hacks that I can do with relocating memory but on the Opteron the more memory I add the more problems I run into. And moving BARS above 4G looks like the right term fix.
When running a 32bit operating system (like linux/x86) does this mean that I am able to see more of my memory, but my pci devices will vanish?
Stefan
Stefan Reinauer stepan@openbios.org writes:
- Eric W. Biederman ebiederman@lnxi.com [040601 08:05]:
Looking the large memory mapped I/O resources all have 64bit BARs. So I am just going to modify the LinuxBIOS resource allocator to put 64bit BARs on x86-64 boxes at least above the top of memory.
There are a couple of short term hacks that I can do with relocating memory but on the Opteron the more memory I add the more problems I run into. And moving BARS above 4G looks like the right term
^long
fix.
When running a 32bit operating system (like linux/x86) does this mean that I am able to see more of my memory, but my pci devices will vanish?
Essentially. Although the pci devices don't vanish you likely can't use them because some of their BARS have values that 32bit kernels can't cope with. lspci should let you see that though. A 32bit kernel with PAE enabled could code but no one has written the code.
I don't have a problem with adding an option to disable this for 32bit kernels. I am just tired of optimizing 64bit machines for 32bit kernels.
I need to be a little bit careful here as AMD's IO-APICs have 64bit BARs but are only 4K. So I might want to put in a size filter. Plus I need to ensure the bridges have enough resources to move things.
Eric
* Eric W. Biederman ebiederman@lnxi.com [040601 09:53]:
Essentially. Although the pci devices don't vanish you likely can't use them because some of their BARS have values that 32bit kernels can't cope with. lspci should let you see that though. A 32bit kernel with PAE enabled could code but no one has written the code.
Are there any drawbacks to that solution? This should be addressed to the Linux kernel developers. Given the advantages this brings, this would probably something people would like to see.
I don't have a problem with adding an option to disable this for 32bit kernels. I am just tired of optimizing 64bit machines for 32bit kernels.
The silver bullet would be an option in the cmos option table to trigger the one or the other behaviour. Or maybe it is enough to just put the PCI BARs into 32bit address space whenever it fits there, taking the amount of RAM into regard. I agree it does not make sense to _optimize_ for 32bit kernels. Essentially there are two groups of potential LinuxBIOS users in this case: Those with small amounts of RAM (ie amd64 based embedded solutions) might use 32bit code because it's easy to test it on many PCs. Those with large amounts of RAM don't want to use a 32bit kernel anyways. With more than about 1G of RAM 32bit Linux performance breaks down noticably compared to the 64bit variant.
Stefan
Stefan Reinauer stepan@openbios.org writes:
- Eric W. Biederman ebiederman@lnxi.com [040601 09:53]:
Essentially. Although the pci devices don't vanish you likely can't use them because some of their BARS have values that 32bit kernels can't cope with. lspci should let you see that though. A 32bit kernel with PAE enabled could code but no one has written the code.
Are there any drawbacks to that solution? This should be addressed to the Linux kernel developers. Given the advantages this brings, this would probably something people would like to see.
struct resource {} needs to be defined in terms of u64 instead of unsigned long. I know just that tweak was discussed and there were some problems with that.
I don't have a problem with adding an option to disable this for 32bit kernels. I am just tired of optimizing 64bit machines for 32bit kernels.
The silver bullet would be an option in the cmos option table to trigger the one or the other behaviour.
Right that is what I was thinking.
Or maybe it is enough to just put the PCI BARs into 32bit address space whenever it fits there, taking the amount of RAM into regard.
Except for some very small filters that starts requiring a filter that is too smart. I think ultimately it is going to the bridge bars that are going to limit me. So I may only be able to apply this to prefetchable memory regions. But that should still cover the large mmio regions with 64bit BARs.
I agree it does not make sense to _optimize_ for 32bit kernels. Essentially there are two groups of potential LinuxBIOS users in this case: Those with small amounts of RAM (ie amd64 based embedded solutions) might use 32bit code because it's easy to test it on many PCs. Those with large amounts of RAM don't want to use a 32bit kernel anyways. With more than about 1G of RAM 32bit Linux performance breaks down noticably compared to the 64bit variant.
Interesting. Most of my customers have compute intensive apps and I don't think see the break down anywhere near that soon.
Eric
just FYI, making BARS live above the 32-bit limit will break every single linux cluster here at LANL, and will also disable Plan 9.
With BARS under the 32-bit limit, you can boot anything. With BARS above that limit, you immediately limit what you can boot.
I think what really ought to happen is we fix linux early boot to do what we wanted in the beginning of this project 5 years ago: linux should do BAR optimization for the kernel that is running.
In principle, I like your BAR fix, but setting up BARS that are optimized for 64-bit kernels should be a (normally disabled) option.
I wonder if we can put BAR reallocation into a payload?
thanks
ron
ron minnich rminnich@lanl.gov writes:
just FYI, making BARS live above the 32-bit limit will break every single linux cluster here at LANL, and will also disable Plan 9.
Only for the hardware where the BARs move.
With BARS under the 32-bit limit, you can boot anything. With BARS above that limit, you immediately limit what you can boot.
Yes. And largely I see that as a good thing.
I think what really ought to happen is we fix linux early boot to do what we wanted in the beginning of this project 5 years ago: linux should do BAR optimization for the kernel that is running.
There are enough other bits you need to twiddle I don't think you can implement that generically.
In principle, I like your BAR fix, but setting up BARS that are optimized for 64-bit kernels should be a (normally disabled) option.
For old clusters I agree that it should be normally disabled.
I wonder if we can put BAR reallocation into a payload?
The amount of code should be just a handful of lines so I don't think that will be necessary. Especially since I have to go through a very limited number of bridge BARs.
Eric
On 1 Jun 2004, Eric W. Biederman wrote:
ron minnich rminnich@lanl.gov writes:
just FYI, making BARS live above the 32-bit limit will break every single linux cluster here at LANL, and will also disable Plan 9.
Only for the hardware where the BARs move.
guess I missed something in your writeup. Are you saying that moving all BARs above 2^32 won't cause trouble for a 32-bit mode linux or freebsd or plan 9 or ...
With BARS under the 32-bit limit, you can boot anything. With BARS above that limit, you immediately limit what you can boot.
Yes. And largely I see that as a good thing.
But do the potential users of linuxbios systems that as a good thing? I understand the motivation, but sometimes customers (including us) do silly things, such as run K8s in 32-bit mode (it's actually a good thing as a K8 is a better Xeon than a Xeon, at least for our programs).
As long as it is an option I don't mind however.
In principle, I like your BAR fix, but setting up BARS that are optimized for 64-bit kernels should be a (normally disabled) option.
For old clusters I agree that it should be normally disabled.
for our new cluster (Lightning) first boot is into 32-bit linux for now, and will likely stay that way for a while as that system runs as 32-bits (not my decision ...)
This change can hurt new clusters.
ron
ron minnich rminnich@lanl.gov writes:
On 1 Jun 2004, Eric W. Biederman wrote:
ron minnich rminnich@lanl.gov writes:
just FYI, making BARS live above the 32-bit limit will break every single linux cluster here at LANL, and will also disable Plan 9.
Only for the hardware where the BARs move.
guess I missed something in your writeup. Are you saying that moving all BARs above 2^32 won't cause trouble for a 32-bit mode linux or freebsd or plan 9 or ...
You can't move all BARs just 64bit BARs. So you will lose some devices like infiniband but not everything. And because of limited bars to filter things through on logical pci-pci bridges I can't throw every 64bit BAR above 4G.
With BARS under the 32-bit limit, you can boot anything. With BARS above that limit, you immediately limit what you can boot.
Yes. And largely I see that as a good thing.
But do the potential users of linuxbios systems that as a good thing? I understand the motivation, but sometimes customers (including us) do silly things, such as run K8s in 32-bit mode (it's actually a good thing as a K8 is a better Xeon than a Xeon, at least for our programs).
I had customers 3 years ago who wanted 6GB of ram. I have been telling salesmen that they can't sell nodes with more than 4GB of RAM several times a year since then, because you can't use more RAM then that in 2 MPI processes. Customers coming from 64bit boxes don't get why commodity machines have the silly size limits on things, that the hardware can reasonably do.
So the demand exists.
As long as it is an option I don't mind however.
At this stage it would be silly not to have an option.
In principle, I like your BAR fix, but setting up BARS that are optimized for 64-bit kernels should be a (normally disabled) option.
For old clusters I agree that it should be normally disabled.
for our new cluster (Lightning) first boot is into 32-bit linux for now, and will likely stay that way for a while as that system runs as 32-bits (not my decision ...)
This change can hurt new clusters.
We are still in the transition where 64bit support is maturing but if we don't push forward we will hurt more. x86 boxes have been > 32bit for quite a while.
It is a lot easier for me to explain why things suck in a 32bit kernel and not in a 64bit kernel then the other way around.
As for things like beoboot some news.
1) Kexec is slowly making it's way into the kernel, the system call number as of 2.6.7-rc1 is reserved in Linus's tree.
2) Getting a kexec that will work on x86-64 is one of the things high on my TODO list.. It really isn't going to be hard I just need to make a little time.
3) There already exist kexec ports to ppc32 and ppc64.
I have no intention to push stable production machines to 64bits, I might encourage but things that work don't need to be messed with. For new machines especially looking into next year I want 64bit kernels. Even if they have a 32bit user space I want 64bit kernels.
Eric
On 1 Jun 2004, Eric W. Biederman wrote:
At this stage it would be silly not to have an option.
well then we are on the same page, I'm looking forward to having it boot with no legacy 32-bit nonsense in there :-)
ron
On 1 Jun 2004, Eric W. Biederman wrote:
There are a couple of short term hacks that I can do with relocating memory but on the Opteron the more memory I add the more problems I run into. And moving BARS above 4G looks like the right term fix.
Any objections?
How are we going to cover the case of booting 32-bit kernels?
ron