On Wed, Mar 26, 2014 at 03:58:50PM -0400, Gabriel L. Somlo wrote:
On Tue, Mar 18, 2014 at 07:23:17PM -0400, Gabriel L. Somlo wrote:
At this point, can anyone with access to a real, physical, NUMA system dump the smbios tables with dmidecode and post them here? I think that would be very informative.
So I thrashed around a bit trying to find a real NUMA box, and found a Dell R410 whose BIOS claims to support NUMA by disabling the "Node Interleaving" option
So, flipping that option on and off does indeed switch the system between NUMA and UMA modes. Running "numactl -H" in UMA mode gets us this output:
available: 1 nodes (0) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 node 0 size: 131059 MB node 0 free: 127898 MB node distances: node 0 0: 10
In NUMA mode, we get this:
available: 2 nodes (0-1) node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 node 0 size: 65536 MB node 0 free: 64047 MB node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 node 1 size: 65523 MB node 1 free: 63841 MB node distances: node 0 1 0: 10 20 1: 20 10
The "dmidecode" output stays unchanged between UMA and NUMA modes (showing only memory tables, types 16-19, and there is NO type 20):
Handle 0x1000, DMI type 16, 15 bytes Physical Memory Array Location: System Board Or Motherboard Use: System Memory Error Correction Type: Multi-bit ECC Maximum Capacity: 128 GB Error Information Handle: Not Provided Number Of Devices: 8
Handle 0x1100, DMI type 17, 28 bytes Memory Device Array Handle: 0x1000 Error Information Handle: Not Provided Total Width: 72 bits Data Width: 64 bits Size: 16384 MB Form Factor: DIMM Set: 1 Locator: DIMM_A1 Bank Locator: Not Specified Type: DDR3 Type Detail: Synchronous Registered (Buffered) Speed: 1066 MHz Manufacturer: 00CE00B380CE Serial Number: 44056D5E Asset Tag: 01105061 Part Number: M393B2K70CM0-YF8 Rank: 4
Handle 0x1101, DMI type 17, 28 bytes Handle 0x1102, DMI type 17, 28 bytes Handle 0x1103, DMI type 17, 28 bytes Handle 0x1109, DMI type 17, 28 bytes Handle 0x110A, DMI type 17, 28 bytes Handle 0x110B, DMI type 17, 28 bytes Handle 0x110C, DMI type 17, 28 bytes /* all of them similar to 0x1100 above [GLS] */
Handle 0x1300, DMI type 19, 15 bytes Memory Array Mapped Address Starting Address: 0x00000000000 Ending Address: 0x000BFFFFFFF Range Size: 3 GB Physical Array Handle: 0x1000 Partition Width: 2
Handle 0x1301, DMI type 19, 15 bytes Memory Array Mapped Address Starting Address: 0x00100000000 Ending Address: 0x0203FFFFFFF Range Size: 125 GB Physical Array Handle: 0x1000 Partition Width: 2
The output of "dmesg | grep -i e820" remains unchanged between UMA and NUMA modes:
e820: BIOS-provided physical RAM map: BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable BIOS-e820: [mem 0x0000000000100000-0x00000000bf378fff] usable BIOS-e820: [mem 0x00000000bf379000-0x00000000bf38efff] reserved BIOS-e820: [mem 0x00000000bf38f000-0x00000000bf3cdfff] ACPI data BIOS-e820: [mem 0x00000000bf3ce000-0x00000000bfffffff] reserved BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved BIOS-e820: [mem 0x00000000fe000000-0x00000000ffffffff] reserved BIOS-e820: [mem 0x0000000100000000-0x000000203fffffff] usable e820: update [mem 0x00000000-0x0000ffff] usable ==> reserved e820: remove [mem 0x000a0000-0x000fffff] usable e820: last_pfn = 0x2040000 max_arch_pfn = 0x400000000 e820: update [mem 0xc0000000-0xffffffff] usable ==> reserved e820: last_pfn = 0xbf379 max_arch_pfn = 0x400000000 e820: [mem 0xc0000000-0xdfffffff] available for PCI devices PCI: MMCONFIG at [mem 0xe0000000-0xefffffff] reserved in E820 e820: reserve RAM buffer [mem 0xbf379000-0xbfffffff]
So, even in NUMA mode, there still appear to be only two contiguous E820_RAM type entries in the "sanitized" e820 table (the regions marked "usable" appear to be adjacent). And the E820_RAM contiguous regions are not per-node or anything like that. Not that this is to be taken as the "One True Way", but still, it's a data point.
Anyhow, my original question/worry ("Oonce you create Type17 DIMMs from all the RAM, and Type19 regions for each E820_RAM type entry, how do you tie them together with Type20s ?") has been answered ("You just don't" :) )
Just figured I'd share this, maybe it can be useful to someone else besides me...
Cheers, --Gabriel