Someone mentioned that NUMA support for dual core opteron need acpi support in LinuxBIOS.
there may be some other solution for that. 1. PowerPC already support dual core and it should support NUMA, So the Open Firmware must have some NUMA entry definition. Can we make x86-64 kernel support OpenFirmware interface so we can use OpenBIOS as payload of LinuxBIOS. 2. enable acpi and add the NUMA entries into it, the Linux Kernel will be happy. 3. If we are trying to use ADLO to load Windows/Solaris/FreeBSD, We need to pass related acpi info to ADLO....
Solution 1 will be ideal one, and can make Solaris for X86-64 use OpenFirmware interface too.....
which one is better?
YH
On Thu, 2005-07-14 at 10:58 -0700, yhlu wrote:
Someone mentioned that NUMA support for dual core opteron need acpi support in LinuxBIOS.
there may be some other solution for that.
- PowerPC already support dual core and it should support NUMA, So
the Open Firmware must have some NUMA entry definition. Can we make x86-64 kernel support OpenFirmware interface so we can use OpenBIOS as payload of LinuxBIOS. 2. enable acpi and add the NUMA entries into it, the Linux Kernel will be happy. 3. If we are trying to use ADLO to load Windows/Solaris/FreeBSD, We need to pass related acpi info to ADLO....
Solution 1 will be ideal one, and can make Solaris for X86-64 use OpenFirmware interface too.....
which one is better?
AFIAK, for x86_64 kernel, it will try to read NUMA configuration from HW directory. We don't have to export any ACPI table.
AFIAK, for x86_64 kernel, it will try to read NUMA configuration from HW directory. We don't have to export any ACPI table.
It doesn't work for dual core or 8 sockets for some reason. Since the SRAT code works fine and is in general more future proof we never tracked down why. Patches welcome.
However you'll likely need ACPI for other reasons anyways, e.g. for better power saving.
-Andi
FYI in Tyan S4881 (8 ways dual core 875 cpu ) with LinuxBIOS I got also the 1G mem hole is enabled.
So the kernel should be OK with read NUMA directly from HW.
YH
Firmware type: LinuxBIOS old bootloader convention, maybe loadlin? Bootdata ok (command line is apic=debug ramdisk_size=65536 root=/dev/ram0 rw console=tty0 console=ttyS0,115200n8 ) Linux version 2.6.12 (root@tst288xsuse9) (gcc version 3.3.3 (SuSE Linux)) #8 SMP Fri Jun 24 12:41:43 PDT 2005 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 0000000000000e48 (reserved) BIOS-e820: 0000000000000e48 - 00000000000a0000 (usable) BIOS-e820: 00000000000f0000 - 00000000000f0400 (reserved) BIOS-e820: 0000000000100000 - 00000000c0000000 (usable) BIOS-e820: 0000000100000000 - 0000000840000000 (usable) Scanning NUMA topology in Northbridge 24 Number of nodes 8 Node 0 MemBase 0000000000000000 Limit 000000013fffffff Node 1 MemBase 0000000140000000 Limit 000000023fffffff Node 2 MemBase 0000000240000000 Limit 000000033fffffff Node 3 MemBase 0000000340000000 Limit 000000043fffffff Node 4 MemBase 0000000440000000 Limit 000000053fffffff Node 5 MemBase 0000000540000000 Limit 000000063fffffff Node 6 MemBase 0000000640000000 Limit 000000073fffffff Node 7 MemBase 0000000740000000 Limit 000000083fffffff node 1 shift 24 addr 140000000 conflict 0 node 1 shift 25 addr 1fe000000 conflict 0 node 3 shift 26 addr 3fc000000 conflict 0 node 7 shift 27 addr 7f8000000 conflict 0 Using node hash shift of 28 Bootmem setup node 0 0000000000000000-000000013fffffff Bootmem setup node 1 0000000140000000-000000023fffffff Bootmem setup node 2 0000000240000000-000000033fffffff Bootmem setup node 3 0000000340000000-000000043fffffff Bootmem setup node 4 0000000440000000-000000053fffffff Bootmem setup node 5 0000000540000000-000000063fffffff Bootmem setup node 6 0000000640000000-000000073fffffff Bootmem setup node 7 0000000740000000-000000083fffffff
in LB Setting variable MTRR 0, base: 0MB, range: 32768MB, type WB Setting variable MTRR 1, base: 32768MB, range: 1024MB, type WB Setting variable MTRR 2, base: 3072MB, range: 1024MB, type UC
On 7/14/05, Andi Kleen ak@suse.de wrote:
AFIAK, for x86_64 kernel, it will try to read NUMA configuration from HW directory. We don't have to export any ACPI table.
It doesn't work for dual core or 8 sockets for some reason. Since the SRAT code works fine and is in general more future proof we never tracked down why. Patches welcome.
However you'll likely need ACPI for other reasons anyways, e.g. for better power saving.
-Andi
On Thu, 14 Jul 2005, Andi Kleen wrote:
However you'll likely need ACPI for other reasons anyways, e.g. for better power saving.
bummer. What the BIOS vendors are doing (to lock in proprietary BIOS, some say) is making ACPI tables copyright the BIOS vendor, not the motherboard vendor. So LinuxBIOS will have to reverse engineer their own, somehow.
Shame we can't free ourselves of ACIP a bit. Oh well.
ron
On Thu, Jul 14, 2005 at 01:04:26PM -0600, Ronald G. Minnich wrote:
On Thu, 14 Jul 2005, Andi Kleen wrote:
However you'll likely need ACPI for other reasons anyways, e.g. for better power saving.
bummer. What the BIOS vendors are doing (to lock in proprietary BIOS, some say) is making ACPI tables copyright the BIOS vendor, not the motherboard vendor. So LinuxBIOS will have to reverse engineer their own, somehow.
You don't need full support, many of it is optional and will fall back to the old methods if not available. e.g. you can probably leave out most of the PCI support if you don't want to support PCI hotplug. Longer term it might be needed again for power management though.
Doing PST objects for power saving shouldn't be that difficult, but you need knowledge of the CPUs from their data sheet (and some testing if the power regulators on the mobo can take all the transitions) But it shouldn't be very motherboard specific.
However that said there is a lot of useful information in the FADT and some other tables and I definitely plan to use more of it in the future.
-Andi
On Thu, 2005-07-14 at 20:48 +0200, Andi Kleen wrote:
AFIAK, for x86_64 kernel, it will try to read NUMA configuration from HW directory. We don't have to export any ACPI table.
It doesn't work for dual core or 8 sockets for some reason. Since the SRAT code works fine and is in general more future proof we never tracked down why. Patches welcome.
Does each core has its own memory controller?
However you'll likely need ACPI for other reasons anyways, e.g. for better power saving.
-Andi
For S2895, with 1Gx8 Installed and E stepping dual core opteron with 1G mem hole emable, got
Bootdata ok (command line is apic=debug ramdisk_size=65536 root=/dev/ram0 rw console=tty0 console=ttyS0,115200n8 ) Linux version 2.6.12-rc5 (root@tst288xsuse9) (gcc version 3.3.3 (SuSE Linux)) #26 SMP Thu Jun 2 18:15:44 PDT 2005 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 0000000000000dd0 (reserved) BIOS-e820: 0000000000000dd0 - 00000000000a0000 (usable) BIOS-e820: 00000000000f0000 - 00000000000f0400 (reserved) BIOS-e820: 0000000000100000 - 00000000c0000000 (usable) BIOS-e820: 0000000100000000 - 0000000240000000 (usable) ACPI: Unable to locate RSDP Scanning NUMA topology in Northbridge 24 Number of nodes 2 Node 0 MemBase 0000000000000000 Limit 000000013fffffff Node 1 MemBase 0000000140000000 Limit 000000023fffffff node 1 shift 24 addr 140000000 conflict 0 node 1 shift 25 addr 1fe000000 conflict 0 Using node hash shift of 26 Bootmem setup node 0 0000000000000000-000000013fffffff Bootmem setup node 1 0000000140000000-000000023fffffff
~ # cat /proc/mt~ # cat /proc/mtrr reg00: base=0x00000000 ( 0MB), size=8192MB: write-back, count=1 reg01: base=0x200000000 (8192MB), size=1024MB: write-back, count=1 reg02: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1
with suse kernel and LinuxBIOS + S2882 I can get
linux:~ # cat /proc/meminfo total: used: free: shared: buffers: cached: Mem: 8131170304 145051648 7986118656 0 15220736 66543616 Swap: 2154979328 0 2154979328 MemTotal: 7940596 kB MemFree: 7798944 kB MemShared: 0 kB Buffers: 14864 kB Cached: 64984 kB SwapCached: 0 kB Active: 39056 kB Inactive: 40844 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 7940596 kB LowFree: 7798944 kB SwapTotal: 2104472 kB SwapFree: 2104472 kB BigFree: 0 kB
Node 0 MemTotal: 4194300 kB Node 0 MemFree: 3793788 kB Node 0 MemUsed: 400512 kB Node 0 HighTotal: 0 kB Node 0 HighFree: 0 kB Node 0 LowTotal: 4194300 kB Node 0 LowFree: 3793788 kB
Node 1 MemTotal: 4194300 kB Node 1 MemFree: 4005156 kB Node 1 MemUsed: 189144 kB Node 1 HighTotal: 0 kB Node 1 HighFree: 0 kB Node 1 LowTotal: 4194300 kB Node 1 LowFree: 4005156 kB
I wonder if suse kernel have some special code to show that. in Kernel.org, I can not Node0....Node 1...
YH
On 7/14/05, Li-Ta Lo ollie@lanl.gov wrote:
On Thu, 2005-07-14 at 10:58 -0700, yhlu wrote:
Someone mentioned that NUMA support for dual core opteron need acpi support in LinuxBIOS.
there may be some other solution for that.
- PowerPC already support dual core and it should support NUMA, So
the Open Firmware must have some NUMA entry definition. Can we make x86-64 kernel support OpenFirmware interface so we can use OpenBIOS as payload of LinuxBIOS. 2. enable acpi and add the NUMA entries into it, the Linux Kernel will be happy. 3. If we are trying to use ADLO to load Windows/Solaris/FreeBSD, We need to pass related acpi info to ADLO....
Solution 1 will be ideal one, and can make Solaris for X86-64 use OpenFirmware interface too.....
which one is better?
AFIAK, for x86_64 kernel, it will try to read NUMA configuration from HW directory. We don't have to export any ACPI table.
-- Li-Ta Lo ollie@lanl.gov Los Alamos National Lab
P.S.: It is very nasty to cc closed mailing lists when posting to open ones. Please don't do that in the future.
-Andi
if there is any chance of getting along without ACPI entries that is best. Linux did do this once already, for SMP K8: K8 can boot and run NUMA without an SRAT table. What more is needed for dual core, and could Linux support in this area be extended?
ron