Actually that is because of Andi's patch cause the problem.
I was wondering why we can not use these ram.
It is good if update kernel instead.
YH
-----Original Message----- From: linuxbios-bounces@linuxbios.org [mailto:linuxbios-bounces@linuxbios.org] On Behalf Of Roman Kononov Sent: Wednesday, January 31, 2007 2:37 PM To: LinuxBIOS Subject: [LinuxBIOS] [PATCH] e820 table correction
Hello,
I have this situation:
Linuxbios boots an Opteron motherboard with 1GB memory.
Linuxbios directly loads a recent linux kernel. The memory layout is like this:
BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 0000000000000e18 (reserved) BIOS-e820: 0000000000000e18 - 00000000000a0000 (usable) BIOS-e820: 00000000000c0000 - 00000000000f0000 (usable) BIOS-e820: 00000000000f0000 - 00000000000f0400 (reserved) BIOS-e820: 00000000000f0400 - 0000000040000000 (usable)
The f0000-f0400 region contains IRQ and ACPI tables.
At some point the kernel builds a resource table containing all physical address ranges and type of hardware the addresses are mapped to. The table is accessible via /proc/iomem:
# cat /proc/iomem 00000000-00000e17 : reserved 00000e18-0009ffff : System RAM 000a0000-000bffff : Video RAM area 000c0000-000cbfff : Video ROM 000f0000-000fffff : System ROM e0000000-efffffff : PCI Bus #03 e0000000-efffffff : 0000:03:00.0 f0000000-f3ffffff : GART f4000000-f60fffff : PCI Bus #03 f4000000-f4ffffff : 0000:03:00.0 f5000000-f5ffffff : 0000:03:00.0 f6000000-f601ffff : 0000:03:00.0 f6100000-f6100fff : 0000:00:01.0 f6101000-f6101fff : 0000:00:02.0 f6101000-f6101fff : ohci_hcd f6102000-f6102fff : 0000:00:04.0 f6103000-f6103fff : 0000:00:07.0 f6103000-f6103fff : sata_nv f6104000-f6104fff : 0000:00:08.0 f6104000-f6104fff : sata_nv f6105000-f6105fff : 0000:00:0a.0 f6106000-f61060ff : 0000:00:02.1 f6200000-f620ffff : 0000:40:01.0
As you can see, the 00000000000f0400-0000000040000000 region is not listed.
It is not listed because the kernel unconditionally adds "000f0000-000fffff : System ROM" first (look for "request_resource(&iomem_resource, &system_rom_resource)"), and then the attempt to add f0400-40000000 range fails because of overlapping.
The kernel does not care that the range is not listed there. Kexec does. It uses the /proc/iomem file to instruct the kexec system call how to place the segments of a new kernel in the physical memory. Kexec fails to start a new kernel because it cannot locate enough physical memory.
This must be fixed either in linux or linuxbios.
Assuming that linuxbios is to be fixed, I cooked a patch which provides this memory layout:
BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 0000000000000e18 (reserved) BIOS-e820: 0000000000000e18 - 00000000000a0000 (usable) BIOS-e820: 00000000000c0000 - 00000000000f0000 (usable) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 0000000040000000 (usable)
The /proc/iomem contains:
# cat /proc/iomem 00000000-00000e17 : reserved 00000e18-0009ffff : System RAM 000a0000-000bffff : Video RAM area 000c0000-000cbfff : Video ROM 000f0000-000fffff : System ROM 00100000-3fffffff : System RAM 00100000-00203c61 : Kernel code 00203c62-00248c3f : Kernel data e0000000-efffffff : PCI Bus #03 e0000000-efffffff : 0000:03:00.0 f0000000-f3ffffff : GART f4000000-f60fffff : PCI Bus #03 f4000000-f4ffffff : 0000:03:00.0 f5000000-f5ffffff : 0000:03:00.0 f6000000-f601ffff : 0000:03:00.0 f6100000-f6100fff : 0000:00:01.0 f6101000-f6101fff : 0000:00:02.0 f6101000-f6101fff : ohci_hcd f6102000-f6102fff : 0000:00:04.0 f6103000-f6103fff : 0000:00:07.0 f6103000-f6103fff : sata_nv f6104000-f6104fff : 0000:00:08.0 f6104000-f6104fff : sata_nv f6105000-f6105fff : 0000:00:0a.0 f6106000-f61060ff : 0000:00:02.1 f6200000-f620ffff : 0000:40:01.0
Kexec is happier with the patch.
Regards,
Signed-off-by: Roman Kononov kononov195-lbl@yahoo.com ---
"Lu, Yinghai" yinghai.lu@amd.com writes:
Actually that is because of Andi's patch cause the problem.
I was wondering why we can not use these ram.
It is good if update kernel instead.
Yes. It looks like we have a little fallout from this cleanup. There is a related issue with just reserving part of the first initial page confusing things as well.
I will try and look into this soon if no one beats me to it, but I need a break hunting for the moment.
Eric
On Saturday 03 February 2007 01:17, Lu, Yinghai wrote:
Actually that is because of Andi's patch cause the problem.
What patch?
-Andi
commit dbf9272e863bf4b17ee8e3c66c26682b2061d40d Author: Andi Kleen ak@suse.de Date: Tue Sep 26 10:52:36 2006 +0200
[PATCH] Don't force reserve the 640k-1MB range
From i386 x86-64 inherited code to force reserve the 640k-1MB area. That was needed on some old systems.
But we generally trust the e820 map to be correct on 64bit systems and mark all areas that are not memory correctly.
This patch will allow to use the real memory in there.
Or rather the only way to find out if it's still needed is to try. So far I'm optimistic.
Signed-off-by: Andi Kleen ak@suse.de
diff --git a/arch/x86_64/kernel/e820.c b/arch/x86_64/kernel/e820.c index 164d0b8..e06c271 100644 --- a/arch/x86_64/kernel/e820.c +++ b/arch/x86_64/kernel/e820.c @@ -71,12 +71,7 @@ static inline int bad_addr(unsigned long *addrp, unsigned long size) return 1; } #endif - /* kernel code + 640k memory hole (later should not be needed, but - be paranoid for now) */ - if (last >= 640*1024 && addr < 1024*1024) { - *addrp = 1024*1024; - return 1; - } + /* kernel code */ if (last >= __pa_symbol(&_text) && last < __pa_symbol(&_end)) { *addrp = __pa_symbol(&_end); return 1; @@ -519,13 +514,6 @@ static int __init sanitize_e820_map(struct e820entry * biosmap, char * pnr_map) * If we're lucky and live on a modern system, the setup code * will have given us a memory map that we can use to properly * set up memory. If we aren't, we'll fake a memory map. - * - * We check to see that the memory map contains at least 2 elements - * before we'll use it, because the detection code in setup.S may - * not be perfect and most every PC known to man has two memory - * regions: one from 0 to 640k, and one from 1mb up. (The IBM - * thinkpad 560x, for example, does not cooperate with the memory - * detection code.) */ static int __init copy_e820_map(struct e820entry * biosmap, int nr_map) { @@ -543,25 +531,6 @@ static int __init copy_e820_map(struct e820entry * biosmap, int nr_map) if (start > end) return -1;
- /* - * Some BIOSes claim RAM in the 640k - 1M region. - * Not right. Fix it up. - * - * This should be removed on Hammer which is supposed to not - * have non e820 covered ISA mappings there, but I had some strange - * problems so it stays for now. -AK - */ - if (type == E820_RAM) { - if (start < 0x100000ULL && end > 0xA0000ULL) { - if (start < 0xA0000ULL) - add_memory_region(start, 0xA0000ULL-start, type); - if (end <= 0x100000ULL) - continue; - start = 0x100000ULL; - size = end - start; - } - } - add_memory_region(start, size, type); } while (biosmap++,--nr_map); return 0;
On 2/2/07, Andi Kleen ak@suse.de wrote:
On Saturday 03 February 2007 01:17, Lu, Yinghai wrote:
Actually that is because of Andi's patch cause the problem.
What patch?
-Andi
-- linuxbios mailing list linuxbios@linuxbios.org http://www.openbios.org/mailman/listinfo/linuxbios
On Sunday 04 February 2007 03:39, yhlu wrote:
commit dbf9272e863bf4b17ee8e3c66c26682b2061d40d Author: Andi Kleen ak@suse.de Date: Tue Sep 26 10:52:36 2006 +0200
[PATCH] Don't force reserve the 640k-1MB range
Ok, but if that breaks LinuxBios then the problem is clearly in LinuxBIOS and needs to be fixed there.
-Andi
Andi Kleen wrote:
On Sunday 04 February 2007 03:39, yhlu wrote:
commit dbf9272e863bf4b17ee8e3c66c26682b2061d40d Author: Andi Kleen ak@suse.de Date: Tue Sep 26 10:52:36 2006 +0200
[PATCH] Don't force reserve the 640k-1MB range
Ok, but if that breaks LinuxBios then the problem is clearly in LinuxBIOS and needs to be fixed there.
Why? Is LinuxBIOS breaking some standard here?
On Sunday 04 February 2007 11:45, Stefan Reinauer wrote:
Andi Kleen wrote:
On Sunday 04 February 2007 03:39, yhlu wrote:
commit dbf9272e863bf4b17ee8e3c66c26682b2061d40d Author: Andi Kleen ak@suse.de Date: Tue Sep 26 10:52:36 2006 +0200
[PATCH] Don't force reserve the 640k-1MB range
Ok, but if that breaks LinuxBios then the problem is clearly in LinuxBIOS and needs to be fixed there.
Why? Is LinuxBIOS breaking some standard here?
If anything between 640K and 1MB isn't memory it should report that properly in the e820 map.
-Andi
On 2/4/07, Andi Kleen ak@suse.de wrote:
On Sunday 04 February 2007 11:45, Stefan Reinauer wrote:
Andi Kleen wrote:
On Sunday 04 February 2007 03:39, yhlu wrote:
commit dbf9272e863bf4b17ee8e3c66c26682b2061d40d Author: Andi Kleen ak@suse.de Date: Tue Sep 26 10:52:36 2006 +0200
[PATCH] Don't force reserve the 640k-1MB range
Ok, but if that breaks LinuxBios then the problem is clearly in LinuxBIOS and needs to be fixed there.
Why? Is LinuxBIOS breaking some standard here?
If anything between 640K and 1MB isn't memory it should report that properly in the e820 map.
-Andi
Andi, I just reread Roman's original note. I am re-attaching it. It does seem to me that there is some problem in how linux is handling the map. Or am I missing something here? We're happy to fix any linuxbios issues, I just can't see any LinuxBIOS issues in the bug Roman describes.
From what I can see, the kernel is confused because a part of the F
segment is reserved, and a part is available as memory. Does the linux code require that a 64k segment be all one or the other? What are the rules here?
thanks
ron === Hello,
I have this situation:
Linuxbios boots an Opteron motherboard with 1GB memory.
Linuxbios directly loads a recent linux kernel. The memory layout is like this:
BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 0000000000000e18 (reserved) BIOS-e820: 0000000000000e18 - 00000000000a0000 (usable) BIOS-e820: 00000000000c0000 - 00000000000f0000 (usable) BIOS-e820: 00000000000f0000 - 00000000000f0400 (reserved) BIOS-e820: 00000000000f0400 - 0000000040000000 (usable)
The f0000-f0400 region contains IRQ and ACPI tables.
At some point the kernel builds a resource table containing all physical address ranges and type of hardware the addresses are mapped to. The table is accessible via /proc/iomem:
# cat /proc/iomem 00000000-00000e17 : reserved 00000e18-0009ffff : System RAM 000a0000-000bffff : Video RAM area 000c0000-000cbfff : Video ROM 000f0000-000fffff : System ROM e0000000-efffffff : PCI Bus #03 e0000000-efffffff : 0000:03:00.0 f0000000-f3ffffff : GART f4000000-f60fffff : PCI Bus #03 f4000000-f4ffffff : 0000:03:00.0 f5000000-f5ffffff : 0000:03:00.0 f6000000-f601ffff : 0000:03:00.0 f6100000-f6100fff : 0000:00:01.0 f6101000-f6101fff : 0000:00:02.0 f6101000-f6101fff : ohci_hcd f6102000-f6102fff : 0000:00:04.0 f6103000-f6103fff : 0000:00:07.0 f6103000-f6103fff : sata_nv f6104000-f6104fff : 0000:00:08.0 f6104000-f6104fff : sata_nv f6105000-f6105fff : 0000:00:0a.0 f6106000-f61060ff : 0000:00:02.1 f6200000-f620ffff : 0000:40:01.0
As you can see, the 00000000000f0400-0000000040000000 region is not listed.
It is not listed because the kernel unconditionally adds "000f0000-000fffff : System ROM" first (look for "request_resource(&iomem_resource, &system_rom_resource)"), and then the attempt to add f0400-40000000 range fails because of overlapping.
The kernel does not care that the range is not listed there. Kexec does. It uses the /proc/iomem file to instruct the kexec system call how to place the segments of a new kernel in the physical memory. Kexec fails to start a new kernel because it cannot locate enough physical memory.
This must be fixed either in linux or linuxbios.
Assuming that linuxbios is to be fixed, I cooked a patch which provides this memory layout:
BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 0000000000000e18 (reserved) BIOS-e820: 0000000000000e18 - 00000000000a0000 (usable) BIOS-e820: 00000000000c0000 - 00000000000f0000 (usable) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 0000000040000000 (usable)
The /proc/iomem contains:
# cat /proc/iomem 00000000-00000e17 : reserved 00000e18-0009ffff : System RAM 000a0000-000bffff : Video RAM area 000c0000-000cbfff : Video ROM 000f0000-000fffff : System ROM 00100000-3fffffff : System RAM 00100000-00203c61 : Kernel code 00203c62-00248c3f : Kernel data e0000000-efffffff : PCI Bus #03 e0000000-efffffff : 0000:03:00.0 f0000000-f3ffffff : GART f4000000-f60fffff : PCI Bus #03 f4000000-f4ffffff : 0000:03:00.0 f5000000-f5ffffff : 0000:03:00.0 f6000000-f601ffff : 0000:03:00.0 f6100000-f6100fff : 0000:00:01.0 f6101000-f6101fff : 0000:00:02.0 f6101000-f6101fff : ohci_hcd f6102000-f6102fff : 0000:00:04.0 f6103000-f6103fff : 0000:00:07.0 f6103000-f6103fff : sata_nv f6104000-f6104fff : 0000:00:08.0 f6104000-f6104fff : sata_nv f6105000-f6105fff : 0000:00:0a.0 f6106000-f61060ff : 0000:00:02.1 f6200000-f620ffff : 0000:40:01.0
Kexec is happier with the patch.
Andi Kleen ak@suse.de writes:
On Sunday 04 February 2007 11:45, Stefan Reinauer wrote:
Andi Kleen wrote:
On Sunday 04 February 2007 03:39, yhlu wrote:
commit dbf9272e863bf4b17ee8e3c66c26682b2061d40d Author: Andi Kleen ak@suse.de Date: Tue Sep 26 10:52:36 2006 +0200
[PATCH] Don't force reserve the 640k-1MB range
Ok, but if that breaks LinuxBios then the problem is clearly in LinuxBIOS and needs to be fixed there.
Why? Is LinuxBIOS breaking some standard here?
If anything between 640K and 1MB isn't memory it should report that properly in the e820 map.
No the problem was that the patch was incomplete. Linux is still reserving a ROM at 0xf0000-0xfffff. Now when LinuxBIOS properly reports that area as RAM that you can do something with, Linux ignores it, because of the ROM reservation is reserved first. But worse the whole range is thrown out which in this case was: BIOS-e820: 00000000000f0400 - 0000000040000000 (usable)
So basically none of the memory below 4G is reported in /proc/iomem. Which is painful.
When /sbin/kexec goes to regenerate the e820 map since we have huge holes in the e820 map the whole thing falls over.
Which means we want to either delete the System ROM reservation, perform the System ROM reservation after other reservations from the e820 map, or improve the error handling.
I believe the following trivial patch will resolve the issue but reserving the ROMs later.
diff --git a/arch/x86_64/kernel/setup.c b/arch/x86_64/kernel/setup.c index af425a8..f9610b7 100644 --- a/arch/x86_64/kernel/setup.c +++ b/arch/x86_64/kernel/setup.c @@ -522,8 +522,8 @@ void __init setup_arch(char **cmdline_p) * Request address space for all standard RAM and ROM resources * and also for regions reported as reserved by the e820. */ - probe_roms(); e820_reserve_resources(); + probe_roms(); e820_mark_nosave_regions();
request_resource(&iomem_resource, &video_ram_resource);
On 02/04/2007 12:33 PM, Eric W. Biederman wrote:
Which means we want to either delete the System ROM reservation, perform the System ROM reservation after other reservations from the e820 map, or improve the error handling.
What is the point in making reservation of a ROM which does not exist at all at its legacy physical location? It is disinformation. I would not reserve any ROM (system, video, etc.) when there is no ROM. The c0000-f0000 range is also wrongly converted from RAM to ROM by the kernel.
Roman Kononov kononov195-lbl@yahoo.com writes:
On 02/04/2007 12:33 PM, Eric W. Biederman wrote:
Which means we want to either delete the System ROM reservation, perform the System ROM reservation after other reservations from the e820 map, or improve the error handling.
What is the point in making reservation of a ROM which does not exist at all at its legacy physical location? It is disinformation. I would not reserve any ROM (system, video, etc.) when there is no ROM. The c0000-f0000 range is also wrongly converted from RAM to ROM by the kernel.
From what I could see the rest of those checks actually test for a
ROM signature. Regardless with my patch that switches the order the current error handling challenged code will simply drop those regions.
Eric
On Monday 05 February 2007 17:23, Roman Kononov wrote:
On 02/04/2007 12:33 PM, Eric W. Biederman wrote:
Which means we want to either delete the System ROM reservation, perform the System ROM reservation after other reservations from the e820 map, or improve the error handling.
What is the point in making reservation of a ROM which does not exist at all at its legacy physical location? It is disinformation. I would not reserve any ROM (system, video, etc.) when there is no ROM. The c0000-f0000 range is also wrongly converted from RAM to ROM by the kernel.
Yes I'm tempted to just remove all the rom probing code too together with the sysrom because it shouldn't be needed.
-Andi