Dear coreboot folks,
Playing around with the trace feature of the Dediprog EM100Pro, I noticed several flash ROM accesses until the payload is loaded.
Are there ways or strategies to preload the whole flash ROM chip content into memory for faster access right after RAM is set up for example? What does that depend on? Does that make any sense at all?
Thanks,
Paul
On Sat, Jan 21, 2017 at 2:32 PM Paul Menzel via coreboot < coreboot@coreboot.org> wrote:
Are there ways or strategies to preload the whole flash ROM chip content into memory for faster access right after RAM is set up for example? What does that depend on? Does that make any sense at all?
Back on older chipsets, and maybe on newer ones, there was a setting called shadow ram. You could set it so writes went to shadow ram, and later set it so reads did. Why? It was designed to let you copy slow flash into fast ram and then use ram.
So, set shadow ram so that writes to the e0000-fffff range went to ram memcpy(e0000, e0000, 128K) set shadow ram so that reads to e0000-fffff go to ram
voila! you're fetching code/data from RAM.
Of course the RAM could then change after the copy. This was why sometimes people would dd if=/dev/mem etc. etc. and they'd see that the output of dd differed from what they thought was in FLASH: because the bios would modify itself for various reasons.
So, your question makes a lot of sense. Or at least it used to :-) Does it any more? That's an interesting question ... I see very few references to shadow ram in the code base any more.
Anyway ...
Now, linuxbios and (for a while) coreboot were designed to never persist. So the question becomes this: are you better off doing the full copy for a lot of ramstage code you'll use, at most, once; or are you better off letting the cache do its job once the CPU is working such that you don't need to mess with shadow ram?
factors include: o to be sure your write shadow ram, you need to do a full read of flash and (uncached) write to ram. This will be pretty slow. How slow? o is your cache much larger than 128k in size? i.e. what are the odds that you'll never flush a cache line from flash so you don't fetch a cache line from flash twice? o how much of the flash do you need to copy? Do you really use ALL the code in flash or just parts? o if you count on cache, then the flash data only moves from flash to cpu die as you execute coreboot. If you use shadow ram, the data path is more like flash -> cache -> ram ... and, hey, if your cache is big, then the result of all that copying to ram is a full cache. If you only use it once, storing to ram was kind of pointless. Of course, it could be writeback, not writethough, and so on. o if you compress the ramstage, you're going to do that copy no matter what.
A quick git grep -i shadow.*ram shows very little usage nowadays. I have not looked at this in a very long time and my knowledge is almost certainly obsolete. Can someone more knowledgable comment on shadow ram usage in newer parts/mainboards?
Peter's answer is on the money.
ron
Paul Menzel via coreboot wrote:
Are there ways or strategies to preload the whole flash ROM chip content into memory for faster access right after RAM is set up for example? What does that depend on? Does that make any sense at all?
That's called BIOS shadowing and was popular at least in the 90s.
I'm sure hardware still supports it, but I don't know if it makes a lot of sense.
If the trace shows a large number of random accesses in ramstage then it might make a significant difference, but I suspect it's not really a bottleneck.
The payload should be read using memcpy() anyway, so that wouldn't benefit.
And reading the entire chip only for ramstage doesn't make sense.
Finally, modern chipsets can and will prefetch at least some amount of flash contents, but maybe only a small number of bytes. I remember reading 16 bytes somewhere.
//Peter
Hi,
On 01/21/2017 02:30 PM, Paul Menzel via coreboot wrote:
Dear coreboot folks,
Playing around with the trace feature of the Dediprog EM100Pro, I noticed several flash ROM accesses until the payload is loaded.
Are there ways or strategies to preload the whole flash ROM chip content into memory for faster access right after RAM is set up for example? What does that depend on? Does that make any sense at all?
preloading whole flash is a bad idea, because you have to pay upfront IO cost for whole-flash read. And then most of that is going to be wasted anyways, because you likely need only parts of the flash at that point.
On Apollolake at least, SPI hardware sequencer has some internal cache and when combined with regular CPU cache (just set MTRRs to cover memory mapped SPI flash) seems to work effectively. There was an issue we found recently where ramstage never cached mmaped BIOS area but that was addressed swiftly.
What you could do is to pre-populate cache with flash data right before it is going to be used. So you could read just a byte from each page of memory-mapped payload, and cause spi hardware to read the whole page in 'background' due to work of prefetchers. This may be useful if you are at the last stages of ramstage and doing PCI device IO and waiting/spinning. So by the time you want to load payload it is already "preloaded" in the cache.
However on apollolake grand total for IO is less than 100ms and even less for payload so I suspect benefits from such hack are going to be pretty small.
Andrey
Adding onto what others have said:
The additional accesses you're seeing are the cbfs searches for additional ROMS and configuration data.
I don't think that it would save much time if we cached the cbfs locations and filenames, but it's possible that it might save a very small amount. I doubt that it's worth the added complexity though.
Since we generally only read the files from the ROM once, even if we knew exactly which bits we were going to need, it doesn't save any time to pre-load the pieces into memory.
Martin
On Sat, Jan 21, 2017 at 11:03 PM, Andrey Petrov andrey.petrov@intel.com wrote:
Hi,
On 01/21/2017 02:30 PM, Paul Menzel via coreboot wrote:
Dear coreboot folks,
Playing around with the trace feature of the Dediprog EM100Pro, I noticed several flash ROM accesses until the payload is loaded.
Are there ways or strategies to preload the whole flash ROM chip content into memory for faster access right after RAM is set up for example? What does that depend on? Does that make any sense at all?
preloading whole flash is a bad idea, because you have to pay upfront IO cost for whole-flash read. And then most of that is going to be wasted anyways, because you likely need only parts of the flash at that point.
On Apollolake at least, SPI hardware sequencer has some internal cache and when combined with regular CPU cache (just set MTRRs to cover memory mapped SPI flash) seems to work effectively. There was an issue we found recently where ramstage never cached mmaped BIOS area but that was addressed swiftly.
What you could do is to pre-populate cache with flash data right before it is going to be used. So you could read just a byte from each page of memory-mapped payload, and cause spi hardware to read the whole page in 'background' due to work of prefetchers. This may be useful if you are at the last stages of ramstage and doing PCI device IO and waiting/spinning. So by the time you want to load payload it is already "preloaded" in the cache.
However on apollolake grand total for IO is less than 100ms and even less for payload so I suspect benefits from such hack are going to be pretty small.
Andrey
-- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot