Hello:
With Scott's work on PCIe support for the E350M1, the NIC and USB3 are now working -- Thanks Scott!
The remaining problems that I know of are:
1) Enabling coreboot serial debugging slows system boot dramatically: 5min+ Someone mentioned in IRC that this is because we are attempting to write to the serial device before it is ready, which causes some kind of timeout/backoff/retry sequence. How can I help with this?
2) System RAM is reported incorrectly. In linux, "free -m" reports 480mb of total RAM -- the full total is 4gb. The vendor BIOS reports 3.5gb. I did some digging and found out that somehow, Coreboot does see all of the ram, but for some reason, it is not marked as available ram. From the log:
coreboot memory table: 0. 0000000000000000-0000000000000fff: CONFIGURATION TABLES 1. 0000000000001000-000000000009ffff: RAM 2. 00000000000c0000-000000001efffbff: RAM 3. 00000000c7ff0000-00000000c7ffffff: CONFIGURATION TABLES 4. 00000000c8000000-00000000dfffffff: RESERVED 5. 00000000f8000000-00000000f8ffffff: RESERVED
If my understanding is correct, there's a gap of about 2.8GiB between areas 2 and 3. I don't understand why.
I've experimented a bit with manually adding a call to lb_add_memory_range() in add_mainboard_resources() in mainboard/asrock/e350m1/mainboard.c to attempt to statically add LB_MEM_RAM to the table, but this results in an unbootable system 100% of the time, so I'm pretty sure i'm missing something.
If anyone has any hints/suggestions, or can at least point me at the right sections of code in Coreboot and/or any relevant documentation, I would really appreciate the help.
The boot log (level INFO) is available at: http://www.lucidmachines.com/coreboot/boot
Thank you! -Marshall Buschman
Marshall Buschman wrote:
]Hello: ] ]With Scott's work on PCIe support for the E350M1, the NIC and USB3 are ]now working -- Thanks Scott!
Thanks for testing it on both the boards. Good to hear it works.
]The remaining problems that I know of are: ] ]1) Enabling coreboot serial debugging slows system boot dramatically: 5min+ ]Someone mentioned in IRC that this is because we are attempting to write ]to the serial device before it is ready, which causes some kind of ]timeout/backoff/retry sequence. How can I help with this?
That is weird. The log file you sent is 38819 bytes. I would expect the boot time penalty to be not much more than the I/O time of 38819 bytes / (11520 bytes/second) = 3.37 seconds. I did a test with loglevel 8. It logged 45745 bytes and the boot time from cold reset to DOS prompt was 5.56 seconds. When I watch the serial output, it spews text nearly continuously. There is no hardware or software handshaking for the writes, so nothing should slow it down.
]2) System RAM is reported incorrectly. In linux, "free -m" reports 480mb ]of total RAM -- the full total is 4gb.
I see, I had tested only a small memory configuration so far. It looks like any size greater than 4GB will fail. Try the attached patch.
Thanks, Scott
On 6/19/2011 1:38 AM, Scott Duplichan wrote:
Marshall Buschman wrote:
]Hello: ] ]With Scott's work on PCIe support for the E350M1, the NIC and USB3 are ]now working -- Thanks Scott!
Thanks for testing it on both the boards. Good to hear it works.
]The remaining problems that I know of are: ] ]1) Enabling coreboot serial debugging slows system boot dramatically: 5min+ ]Someone mentioned in IRC that this is because we are attempting to write ]to the serial device before it is ready, which causes some kind of ]timeout/backoff/retry sequence. How can I help with this?
That is weird. The log file you sent is 38819 bytes. I would expect the boot time penalty to be not much more than the I/O time of 38819 bytes / (11520 bytes/second) = 3.37 seconds. I did a test with loglevel 8. It logged 45745 bytes and the boot time from cold reset to DOS prompt was 5.56 seconds. When I watch the serial output, it spews text nearly continuously. There is no hardware or software handshaking for the writes, so nothing should slow it down.
There's something strange afoot -- It's not that it outputs slowly, it's that you get literally no serial output or boot activity of any kind for potentially several minutes (Peter Stuge observed ~5min, I observed times closer to 20min).
]2) System RAM is reported incorrectly. In linux, "free -m" reports 480mb ]of total RAM -- the full total is 4gb.
I see, I had tested only a small memory configuration so far. It looks like any size greater than 4GB will fail. Try the attached patch.
Thanks, Scott
I will test this now and report back.
Thanks! -Marshall Buschman
On 06/19/2011 01:38 AM, Scott Duplichan wrote:
Marshall Buschman wrote:
]Hello: ] ]With Scott's work on PCIe support for the E350M1, the NIC and USB3 are ]now working -- Thanks Scott!
Thanks for testing it on both the boards. Good to hear it works.
]The remaining problems that I know of are: ] ]1) Enabling coreboot serial debugging slows system boot dramatically: 5min+ ]Someone mentioned in IRC that this is because we are attempting to write ]to the serial device before it is ready, which causes some kind of ]timeout/backoff/retry sequence. How can I help with this?
That is weird. The log file you sent is 38819 bytes. I would expect the boot time penalty to be not much more than the I/O time of 38819 bytes / (11520 bytes/second) = 3.37 seconds. I did a test with loglevel 8. It logged 45745 bytes and the boot time from cold reset to DOS prompt was 5.56 seconds. When I watch the serial output, it spews text nearly continuously. There is no hardware or software handshaking for the writes, so nothing should slow it down.
]2) System RAM is reported incorrectly. In linux, "free -m" reports 480mb ]of total RAM -- the full total is 4gb.
I see, I had tested only a small memory configuration so far. It looks like any size greater than 4GB will fail. Try the attached patch.
Works great, 3.5gb of available RAM. I've submitted it into the review system.
Thanks, Scott
Thanks again! -Marshall
On 06/19/2011 02:17 AM, Marshall Buschman wrote:
On 06/19/2011 01:38 AM, Scott Duplichan wrote:
Marshall Buschman wrote:
]Hello: ] ]With Scott's work on PCIe support for the E350M1, the NIC and USB3 are ]now working -- Thanks Scott!
Thanks for testing it on both the boards. Good to hear it works.
]The remaining problems that I know of are: ] ]1) Enabling coreboot serial debugging slows system boot dramatically: 5min+ ]Someone mentioned in IRC that this is because we are attempting to write ]to the serial device before it is ready, which causes some kind of ]timeout/backoff/retry sequence. How can I help with this?
That is weird. The log file you sent is 38819 bytes. I would expect the boot time penalty to be not much more than the I/O time of 38819 bytes / (11520 bytes/second) = 3.37 seconds. I did a test with loglevel 8. It logged 45745 bytes and the boot time from cold reset to DOS prompt was 5.56 seconds. When I watch the serial output, it spews text nearly continuously. There is no hardware or software handshaking for the writes, so nothing should slow it down.
]2) System RAM is reported incorrectly. In linux, "free -m" reports 480mb ]of total RAM -- the full total is 4gb.
I see, I had tested only a small memory configuration so far. It looks like any size greater than 4GB will fail. Try the attached patch.
Works great, 3.5gb of available RAM. I've submitted it into the review system.
Thanks, Scott
Thanks again! -Marshall
Okay, There's something odd going on here. Now the NIC is gone. I'm going to investigate in the morning and abandon the change in gerrit until I know what's going on. -Marshall
On 06/19/2011 02:33 AM, Marshall Buschman wrote:
On 06/19/2011 02:17 AM, Marshall Buschman wrote:
On 06/19/2011 01:38 AM, Scott Duplichan wrote:
Marshall Buschman wrote:
]Hello: ] ]With Scott's work on PCIe support for the E350M1, the NIC and USB3 are ]now working -- Thanks Scott!
Thanks for testing it on both the boards. Good to hear it works.
]The remaining problems that I know of are: ] ]1) Enabling coreboot serial debugging slows system boot dramatically: 5min+ ]Someone mentioned in IRC that this is because we are attempting to write ]to the serial device before it is ready, which causes some kind of ]timeout/backoff/retry sequence. How can I help with this?
That is weird. The log file you sent is 38819 bytes. I would expect the boot time penalty to be not much more than the I/O time of 38819 bytes / (11520 bytes/second) = 3.37 seconds. I did a test with loglevel 8. It logged 45745 bytes and the boot time from cold reset to DOS prompt was 5.56 seconds. When I watch the serial output, it spews text nearly continuously. There is no hardware or software handshaking for the writes, so nothing should slow it down.
]2) System RAM is reported incorrectly. In linux, "free -m" reports 480mb ]of total RAM -- the full total is 4gb.
I see, I had tested only a small memory configuration so far. It looks like any size greater than 4GB will fail. Try the attached patch.
Works great, 3.5gb of available RAM. I've submitted it into the review system.
Thanks, Scott
Thanks again! -Marshall
Okay, There's something odd going on here. Now the NIC is gone. I'm going to investigate in the morning and abandon the change in gerrit until I know what's going on. -Marshall
Nevermind, it works - Apparently there are disadvantages to doing things that require thought in the very early hours of the morning. :| Thanks! -Marshall Buschman
Marshall Buschman wrote:
]Nevermind, it works - Apparently there are disadvantages to doing things ]that require thought in the very early hours of the morning. :| ]Thanks!
Hello Marshall,
Thanks for the update. I tested Win7 with this change and 4GB and found it is not happy. Win7 makes a BSOD. Windbg with checked build reports:
------------------------------------------------- ffffffff84126053: Store(TOM1=0xaaaaaaaa,MM1B)=0xaaaaaaaa ffffffff8412605c: ShiftLeft(0x10000000,0x4,Local0)=0x100000000 ffffffff84126065: Subtract(Local0=0x100000000,TOM1=0xaaaaaaaa,Local0)=0x55555556 ffffffff8412606c: Store(Local0=0x55555556,MM1L)=0x55555556 ffffffff84126072: Return(CRES=Buffer(0x42){ 0x47,0x01,0xf8,0x0c,0xf8,0x0c,0x01,0x08,0x88,0x0d,0x00,0x01,0x0c,0x03 0x00,0x00,0x00,0x00,0xf7,0x0c,0x00,0x00,0xf8,0x0c,0x88,0x0d,0x00,0x01 0x0c,0x03,0x00,0x00,0x00,0x0d,0xff,0xff,0x00,0x00,0x00,0xf3,0x86,0x09 0x00,0x00,0x00,0x00,0x0a,0x00,0x00,0x00,0x02,0x00,0x86,0x09,0x00,0x00 0xaa,0xaa,0xaa,0xaa,0x56,0x55,0x55,0x55,0x79,0x00}) ffffffff84126077: }ACPI: E820 Entry 3 (type 4503599627370497) (c7fee00000000000-700000000) overlaps ACPI: PCI Entry -1431655766 Min:ffffffff00000000 Max:5555555600000000 Length:100000000 Align:0 ACPI: ACPI: FATAL BIOS ERROR - Need new BIOS to fix PCI problems -------------------------------------------------
Unfortunately the Win7 code that prints e820 message has an error where the argument and format string do not match. One is long and the other is long long. That is why the numbers are garbage. The real problem is that the asl code for _SB.PCI0._CRS is using uninitialized variable TOM1. The default value of aaaaaaaa from from line 267 of family14/ssdt.asl is being used.
Somehow the OS does need to know where the PCI hole can safely start. It can't start immediately after the end of low ram because of uma. _SB.PCI0._CRS is one way to pass this information. This method requires passing data from coreboot to asl, which is a pain. I wonder if just reserving the uma range in the e820 map is sufficient? I will try to do some experiments tonight.
If you can send me a binary or otherwise let me recreate the serial logging problem, I will take a look.
Thanks, Scott
Scott Duplichan wrote:
The real problem is that the asl code for _SB.PCI0._CRS is using uninitialized variable TOM1. The default value of aaaaaaaa from from line 267 of family14/ssdt.asl is being used.
Good find. Many thanks Scott.
Somehow the OS does need to know where the PCI hole can safely start. It can't start immediately after the end of low ram because of uma. _SB.PCI0._CRS is one way to pass this information. This method requires passing data from coreboot to asl, which is a pain.
Is it neccessarily that bad? Rudolf has developed some functions to build AML at coreboot run time. It sounds like they might help?
I wonder if just reserving the uma range in the e820 map is sufficient? I will try to do some experiments tonight.
Maybe short term, but the only real solution is indeed to store the correct value from coreboot processing into AML.
If you can send me a binary or otherwise let me recreate the serial logging problem, I will take a look.
http://stuge.se/stuge_e350m1_47b3fb_4mb.bin
//Peter
On 06/19/2011 03:05 PM, Peter Stuge wrote:
Scott Duplichan wrote:
The real problem is that the asl code for _SB.PCI0._CRS is using uninitialized variable TOM1. The default value of aaaaaaaa from from line 267 of family14/ssdt.asl is being used.
Good find. Many thanks Scott.
Somehow the OS does need to know where the PCI hole can safely start. It can't start immediately after the end of low ram because of uma. _SB.PCI0._CRS is one way to pass this information. This method requires passing data from coreboot to asl, which is a pain.
Is it neccessarily that bad? Rudolf has developed some functions to build AML at coreboot run time. It sounds like they might help?
I wonder if just reserving the uma range in the e820 map is sufficient? I will try to do some experiments tonight.
Maybe short term, but the only real solution is indeed to store the correct value from coreboot processing into AML.
If you can send me a binary or otherwise let me recreate the serial logging problem, I will take a look.
http://stuge.se/stuge_e350m1_47b3fb_4mb.bin
//Peter
To add another data point, using Peter's image, it takes roughly 1 minute and 51 seconds to boot. Log file is at http://www.lucidmachines.com/coreboot/1min51sec
Thanks! -Marshall
Peter Stuge wrote:
]> If you can send me a binary or otherwise let me recreate the serial ]> logging problem, I will take a look. ] ]http://stuge.se/stuge_e350m1_47b3fb_4mb.bin
Hello Peter,
Thanks. This shows the problem on my board. I have not been in the habit of enabling kconfig option console_post, and that is why I did not see the problem at first. I noticed the post code logging in Marshall's log file, but forgot to try adding that to mine. The problem is caused when code logs to the serial port before it is initialized. This happens with the first two post calls to post_code(). Apparently the baud rate defaults to some really slow value, causing the those 24 characters to take a long time to transmit. The attached patch is one way to fix the problem. It looks like this problem potentially affects several other coreboot projects.
Thanks, Scott