-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 03/02/2017 01:18 PM, Paul Menzel via coreboot wrote:
Dear coreboot folks,
With 128 GB of RAM consisting of eight 16 GB modules, coreboot takes over a minute to get to the payload on the Asus KGPE-D16 even without serial console enabled [1]. This is not much faster than the vendor firmware.
Please note that the timings below are incorrect. Comparing it with the timings from SeaBIOS’ `script/read_serial.py`, I’d say the time values need to be multiplied by two. (Somebody mentioned, that the reason might be `cbmem -t` using some scaling factor to convert the time stamps to seconds, and that might be wrong on the board.)
$ more asus/kgpe-d16/4.5-1093-g308aeff/2017-03-01T16_03_07Z/coreboot_timestamps.txt 21 entries total: 0:1st timestamp 24,384 1:start of rom stage 25,061 (676) 2:before ram initialization 913,502 (888,441) 3:after ram initialization 35,548,889 (34,635,386) 4:end of romstage 35,642,960 (94,070) 8:starting to load ramstage 35,647,351 (4,391) 15:starting LZMA decompress (ignore for x86) 35,647,872 (520) 16:finished LZMA decompress (ignore for x86) 35,695,864 (47,991) 9:finished loading ramstage 35,696,312 (447) 10:start of ramstage 35,696,893 (581) 30:device enumeration 35,696,897 (3) 40:device configuration 36,639,627 (942,730) 50:device enable 36,644,848 (5,221) 60:device initialization 36,646,012 (1,163) 70:device setup done 37,044,848 (398,836) 75:cbmem post 37,044,850 (1) 80:write tables 37,044,851 (1) 85:finalize chips 37,053,950 (9,099) 90:load payload 37,324,647 (270,697) 15:starting LZMA decompress (ignore for x86) 37,325,042 (395) 16:finished LZMA decompress (ignore for x86) 37,349,321 (24,278) 99:selfboot jump 37,349,328 (7)
I think most of the time is spent in RAM initialization.
- Do board owners with similar amount of memory (independent of the board) have similar numbers?
- What are the ways to improve that? Is it possible? For example, can the modules be probed in parallel (if that isn’t done already)?
Thanks,
Paul
[1] https://review.coreboot.org/cgit/board-status.git/commit/?id=4b4b7ab5865b15a...
The issue isn't probing; the delays are introduced in both DRAM training (DDR3 training is quite complex and involves repeatedly streaming pseudorandom data to/from the modules at full speed) and in the mandatory clearing of the ECC check bits.
The only way to feasibly decrease boot time would be to run the DRAM training on each CPU package (and possibly memory controller, though I don't think that's a good idea) in parallel. This, in turn, couples with previous discussions on whether coreboot, and in particular coreboot's romstage, should even be attempting to provide a multi-tasking environment; i.e. does the added complexity provide a significant enough benefit to justify the maintenance overhead?
- -- Timothy Pearson Raptor Engineering +1 (415) 727-8645 (direct line) +1 (512) 690-0200 (switchboard) https://www.raptorengineering.com