See patch
Am 09.04.2010 21:41, schrieb Stefan Reinauer:
See patch
Boot tested on via/vt8454c, so Acked-by: Patrick Georgi patrick.georgi@coresystems.de
On Fri, Apr 09, 2010 at 09:41:59PM +0200, Stefan Reinauer wrote:
See patch
This runs on my epia-cn. However, it's a bit slower than the romcc code.
Old code timing (r5005 with caching hacks):
00.000: <00> 00.006: <00> 00.343: 00.343: 00.343: coreboot-2.3" Fri Jan 8 20:36:20 EST 2010 starting... 00.344: *pre enable_smbus() 00.402: *post enable_smbus() 00.406: *pre ddr_ram_setup() 00.422: *post ddr_ram_setup() 00.424: Stage: loading fallback/coreboot_ram @ 0x4000 (163840 bytes), entry @ 0x4000 00.450: coreboot-2.3 Fri Apr 2 11:56:00 EDT 2010 booting...
New code timing (r5408):
00.000: <00> 00.446: 0 00.448: 00.448: coreboot-4.0-r5408M Sun Apr 11 22:51:47 EDT 2010 starting... 00.448: *pre enable_smbus() 00.448: *post enable_smbus() 00.452: *pre ddr_ram_setup() 00.464: *post ddr_ram_setup() 00.465: Stage: loading fallback/coreboot_ram @ 0x4000 (163840 bytes), entry @ 0x4000 01.421: coreboot-4.0-r5408M Sun Apr 11 22:51:47 EDT 2010 booting...
The old code was also using tiny boot block, but I had hacked it a bit to enable caching early. (With CAR, I can't do the caching tricks I had earlier.)
The delay after "Stage:.." is almost certainly the setting of the stack to 0x4000000 in src/cpu/via/car/cache_as_ram.inc without caching that ram. (The same problem was in the old code, but I hacked a larger earlier cache.) I'll try the same on the new code.
The extra time to get to the "coreboot-..." banner is a bit puzzling though. The timings are reproducible, so it's not a fluke. I'll have to investigate further.
Great stuff though. Thanks. -Kevin
On Sun, Apr 11, 2010 at 11:22:14PM -0400, Kevin O'Connor wrote:
Old code timing (r5005 with caching hacks):
00.000: <00> 00.006: <00> 00.343: 00.343: 00.343: coreboot-2.3" Fri Jan 8 20:36:20 EST 2010 starting... 00.344: *pre enable_smbus() 00.402: *post enable_smbus() 00.406: *pre ddr_ram_setup() 00.422: *post ddr_ram_setup() 00.424: Stage: loading fallback/coreboot_ram @ 0x4000 (163840 bytes), entry @ 0x4000 00.450: coreboot-2.3 Fri Apr 2 11:56:00 EDT 2010 booting...
New code timing (r5408):
00.000: <00> 00.446: 0 00.448: 00.448: coreboot-4.0-r5408M Sun Apr 11 22:51:47 EDT 2010 starting... 00.448: *pre enable_smbus() 00.448: *post enable_smbus() 00.452: *pre ddr_ram_setup() 00.464: *post ddr_ram_setup() 00.465: Stage: loading fallback/coreboot_ram @ 0x4000 (163840 bytes), entry @ 0x4000 01.421: coreboot-4.0-r5408M Sun Apr 11 22:51:47 EDT 2010 booting...
[...]
The extra time to get to the "coreboot-..." banner is a bit puzzling though. The timings are reproducible, so it's not a fluke. I'll have to investigate further.
The extra time to get to the "coreboot-..." banner is due to the "rep lodsl" insns that populate the cache in src/cpu/via/car/cache_as_ram.inc. I don't think the second "rep lodsl" (that prefetches the rom) is needed - commenting it out gives better timings:
00.000: <00> 00.005: <00> 00.388: 0 00.389: 00.389: coreboot-4.0-r5408M Mon Apr 12 00:19:03 EDT 2010 starting... 00.390: *pre enable_smbus() 00.398: *post enable_smbus() 00.398: *pre ddr_ram_setup() 00.412: *post ddr_ram_setup() 00.415: Stage: loading fallback/coreboot_ram @ 0x4000 (163840 bytes), entry @ 0x4000 01.369: coreboot-4.0-r5408M Mon Apr 12 00:19:03 EDT 2010 booting...
Indeed, the time to "Stage:..." is faster than romcc now. Just need to fix that delay after "Stage:"..
-Kevin
Am 12.04.2010 06:41, schrieb Kevin O'Connor:
00.412: *post ddr_ram_setup() 00.415: Stage: loading fallback/coreboot_ram @ 0x4000 (163840 bytes), entry @ 0x4000 01.369: coreboot-4.0-r5408M Mon Apr 12 00:19:03 EDT 2010 booting...
Indeed, the time to "Stage:..." is faster than romcc now. Just need to fix that delay after "Stage:"..
Your other mail seems to indicate that this is because of the stack at 64MB.
How about this:
While building the romstage, the location and size of the ramstage area is already known.
If we: - Move the intermediate stack to wherever the ramstage stack resides (somewhere close to RAMTOP), - change the stage loaders (copy and ulzma) to leave out %esp..RAMTOP (with some safety margin below %esp), which should always be "0" anyway, and - enable caching for RAMBASE..RAMTOP (already done, I think)
we should use a harmless memory area for stack (esp. in light of wakeup from suspend) and have caching enabled for all relevant memory regions, right?
Patrick
On Mon, Apr 12, 2010 at 11:56:45AM +0200, Patrick Georgi wrote:
Am 12.04.2010 06:41, schrieb Kevin O'Connor:
Indeed, the time to "Stage:..." is faster than romcc now. Just need to fix that delay after "Stage:"..
Your other mail seems to indicate that this is because of the stack at 64MB.
That's my guess - I'll try and confirm tonight.
If we:
- Move the intermediate stack to wherever the ramstage stack resides
(somewhere close to RAMTOP),
- change the stage loaders (copy and ulzma) to leave out %esp..RAMTOP
(with some safety margin below %esp), which should always be "0" anyway, and
- enable caching for RAMBASE..RAMTOP (already done, I think)
I didn't understand the first two points. The third point makes sense - right now on my board RAMBASE is 0x4000 and RAMTOP is 0x200000. So, if we make sure to cache everything up to RAMTOP and then place the ulzma stack somewhere in that memory range I think it should work.
Right now, the code in src/cpu/via/car/cache_as_ram.inc isn't setting up the cache using RAMBASE or RAMTOP - it just does it's own range (which I think is the first 1Meg and a small part of flash) - it's the code between "call main" and "call copy_and_run". (BTW, having this all in one assembler file is much nicer than the old code.)
we should use a harmless memory area for stack (esp. in light of wakeup from suspend) and have caching enabled for all relevant memory regions, right?
As a side note, for suspend, I wonder if it would be better to not enable CAR - just unsuspend ram and place the stack in an area of high memory that was reserved during the initial boot.
-Kevin
Am 12.04.2010 15:39, schrieb Kevin O'Connor:
On Mon, Apr 12, 2010 at 11:56:45AM +0200, Patrick Georgi wrote:
Am 12.04.2010 06:41, schrieb Kevin O'Connor:
Indeed, the time to "Stage:..." is faster than romcc now. Just need to fix that delay after "Stage:"..
Your other mail seems to indicate that this is because of the stack at 64MB.
That's my guess - I'll try and confirm tonight.
If we:
- Move the intermediate stack to wherever the ramstage stack resides
(somewhere close to RAMTOP),
- change the stage loaders (copy and ulzma) to leave out %esp..RAMTOP
(with some safety margin below %esp), which should always be "0" anyway, and
- enable caching for RAMBASE..RAMTOP (already done, I think)
I didn't understand the first two points. The third point makes sense
It seems that the stack is usually close to RAMTOP. We'd have to make sure that it's at a deterministic position (ie. == RAMTOP) and avoid overwriting that stack on decompression, then we could reuse the ramstage stack location for uncompression.
- right now on my board RAMBASE is 0x4000
Most boards have RAMBASE at 1MB (or 2MB in some cases). RAMBASE at 16K is only left for a couple of boards that rely on their own vgabios handling. Two things "should" happen (if someone with the board finds the time): 1. removal of the custom vgabios handling, using oprom instead 2. moving RAMBASE to 1MB
if we make sure to cache everything up to RAMTOP and then place the ulzma stack somewhere in that memory range I think it should work.
That "somewhere" could be the stack location of the ramstage.
Right now, the code in src/cpu/via/car/cache_as_ram.inc isn't setting up the cache using RAMBASE or RAMTOP - it just does it's own range (which I think is the first 1Meg and a small part of flash) - it's the
Might be good to change this at some point.
Patrick
Am 12.04.2010 15:39, schrieb Kevin O'Connor:
On Mon, Apr 12, 2010 at 11:56:45AM +0200, Patrick Georgi wrote:
Am 12.04.2010 06:41, schrieb Kevin O'Connor:
Indeed, the time to "Stage:..." is faster than romcc now. Just need to fix that delay after "Stage:"..
Your other mail seems to indicate that this is because of the stack at 64MB.
That's my guess - I'll try and confirm tonight.
If we:
- Move the intermediate stack to wherever the ramstage stack resides
(somewhere close to RAMTOP),
- change the stage loaders (copy and ulzma) to leave out %esp..RAMTOP
(with some safety margin below %esp), which should always be "0" anyway, and
- enable caching for RAMBASE..RAMTOP (already done, I think)
I didn't understand the first two points. The third point makes sense
It seems that the stack is usually close to RAMTOP. We'd have to make sure that it's at a deterministic position (ie. == RAMTOP) and avoid overwriting that stack on decompression, then we could reuse the ramstage stack location for uncompression.
Are we caching CONFIG_RAMBASE - CONFIG_RAMTOP ?
Most boards have RAMBASE at 1MB (or 2MB in some cases). RAMBASE at 16K is only left for a couple of boards that rely on their own vgabios handling. Two things "should" happen (if someone with the board finds the time):
- removal of the custom vgabios handling, using oprom instead
1.5. do the same thing for vsmsetup.c
- moving RAMBASE to 1MB
Stefan
On Mon, Apr 12, 2010 at 04:21:34PM +0200, Patrick Georgi wrote:
Am 12.04.2010 15:39, schrieb Kevin O'Connor:
On Mon, Apr 12, 2010 at 11:56:45AM +0200, Patrick Georgi wrote:
If we:
- Move the intermediate stack to wherever the ramstage stack resides
(somewhere close to RAMTOP),
- change the stage loaders (copy and ulzma) to leave out %esp..RAMTOP
(with some safety margin below %esp), which should always be "0" anyway, and
- enable caching for RAMBASE..RAMTOP (already done, I think)
I didn't understand the first two points. The third point makes sense
It seems that the stack is usually close to RAMTOP. We'd have to make sure that it's at a deterministic position (ie. == RAMTOP) and avoid overwriting that stack on decompression, then we could reuse the ramstage stack location for uncompression.
Ahh - okay. I was getting the ramstage confused with romstage.
- right now on my board RAMBASE is 0x4000
Most boards have RAMBASE at 1MB (or 2MB in some cases). RAMBASE at 16K is only left for a couple of boards that rely on their own vgabios handling. Two things "should" happen (if someone with the board finds the time):
- removal of the custom vgabios handling, using oprom instead
- moving RAMBASE to 1MB
Agreed.
if we make sure to cache everything up to RAMTOP and then place the ulzma stack somewhere in that memory range I think it should work.
That "somewhere" could be the stack location of the ramstage.
Makes sense.
Right now, the code in src/cpu/via/car/cache_as_ram.inc isn't setting up the cache using RAMBASE or RAMTOP - it just does it's own range (which I think is the first 1Meg and a small part of flash) - it's the
Might be good to change this at some point.
Agreed.
-Kevin