This moves the lx cache setup to stage1, not stage 2. Things run a bit faster.
Still far too slow, however.
ron
On 03.02.2008 21:49, ron minnich wrote:
This change moves the geodelxinit code from stage2 to stage1, which in turn gets cache turned on much sooner. The system boots a bit faster.
We're still far too slow, perhaps because we are not caching ROM?
Have you tried adding a "fillup" file to the lar which occupies the remaing unsused space? That should help a lot.
Besides that, having a boot time breakdown to find the worst offenders is an investment into the future. But how should we implement that? Maybe a Kconfig variable which adds TSC timestamps to all printk() calls? While rdtsc() is slow, it is not slow enough to actually show up in profiles if we use it together with printk.
Index: arch/x86/Makefile Add geodelxinit.o object Index: arch/x86/geodelx/geodelxinit.c svn move here from geode northbridge directory and add sizeram function. Index: arch/x86/geodelx/stage1.c add called to northbridge_init_early() Index: northbridge/amd/geodelx/Makefile remove geodelxinit.o object Index: northbridge/amd/geodelx/geodelxinit.c svn move this to arch/x86/geodelx Index: northbridge/amd/geodelx/geodelx.c remove call to northbridge_init_early()
Signed-off-by: Ronald G. Minnich rminnich@gmail.com
I'm not sure about this patch. It increases stage0/stage1 size and seems to move northbridge init to a generic arch directory. How northbridge specific is geodelxinit.c? Is it a piece of northbridge code? We could link northbridge/amd/geodelx/geodelxinit.c into stage1 even if it is inside the northbridge dir.
Regards, Carl-Daniel
On Feb 3, 2008 3:26 PM, Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
Have you tried adding a "fillup" file to the lar which occupies the remaing unsused space? That should help a lot.
That's no longer the big time item. The big time item is now the decompress, which seems to run at 4 kilobytes per second.
Besides that, having a boot time breakdown to find the worst offenders is an investment into the future. But how should we implement that? Maybe a Kconfig variable which adds TSC timestamps to all printk() calls? While rdtsc() is slow, it is not slow enough to actually show up in profiles if we use it together with printk.
we can just put the option in kconfig. we had this option in v2.
I'm not sure about this patch. It increases stage0/stage1 size and seems to move northbridge init to a generic arch directory. How northbridge specific is geodelxinit.c? Is it a piece of northbridge code? We could link northbridge/amd/geodelx/geodelxinit.c into stage1 even if it is inside the northbridge dir.
The LX does not have the kind of clean breakdown of components we are used to on other systems. I don't think you can say geodelxinit is "northbridge specific" or not. I am happy to move it back to the northbridge directory and just link it in.
We've got to run this in stage1 IMHO. We can't wait until stage2 phase 2 to turn on cache. It's just way too slow.
ron
On 04.02.2008 00:32, ron minnich wrote:
On Feb 3, 2008 3:26 PM, Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
Have you tried adding a "fillup" file to the lar which occupies the remaing unsused space? That should help a lot.
That's no longer the big time item. The big time item is now the decompress, which seems to run at 4 kilobytes per second.
Ouch. That really hurts. We probably want to add print_conf() from v2/src/mainboard/pcengines/alix1c/mainboard.c to v3. Comparing the v3 output of print_conf to the v2 output may be enlightening. I don't have any Geode LX boards, so I can't really test.
Besides that, having a boot time breakdown to find the worst offenders is an investment into the future. But how should we implement that? Maybe a Kconfig variable which adds TSC timestamps to all printk() calls? While rdtsc() is slow, it is not slow enough to actually show up in profiles if we use it together with printk.
we can just put the option in kconfig. we had this option in v2.
Sorry, I can't find such an option in v2 and the svn log also didn't turn up any hits. Could you point me to a specific file and/or revision? I have a (wrong due to varargs for vtxprintf) prototype patch for v3:
Index: lib/console.c =================================================================== --- lib/console.c (Revision 571) +++ lib/console.c (Arbeitskopie) @@ -7,6 +7,8 @@ int vtxprintf(void (*)(unsigned char, void *arg), void *arg, const char *, va_list);
+#define rdtscl(low) __asm__ __volatile__ ("rdtsc" : "=a" (low) : : "edx") + static int console_loglevel(void) { return CONFIG_DEFAULT_CONSOLE_LOGLEVEL; @@ -34,11 +36,15 @@ { va_list args; int i; + u32 tstamp;
if (msg_level > console_loglevel()) { return 0; }
+ rdtscl(tstamp); + vtxprintf(console_tx_byte, (void *)0, "[tsc %d] ", tstamp); + va_start(args, fmt); i = vtxprintf(console_tx_byte, (void *)0, fmt, args); va_end(args);
I'm not sure about this patch. It increases stage0/stage1 size and seems to move northbridge init to a generic arch directory. How northbridge specific is geodelxinit.c? Is it a piece of northbridge code? We could link northbridge/amd/geodelx/geodelxinit.c into stage1 even if it is inside the northbridge dir.
The LX does not have the kind of clean breakdown of components we are used to on other systems. I don't think you can say geodelxinit is "northbridge specific" or not. I am happy to move it back to the northbridge directory and just link it in.
Ah yes, the Geode architecture.
We've got to run this in stage1 IMHO. We can't wait until stage2 phase 2 to turn on cache. It's just way too slow.
Indeed. Without the file move, the patch is fine. Could you post a new version?
Thanks, Carl-Daniel
Here is the patch that does not do the move.
Thanks
ron
Hi Ron,
I really like it. One minor typo in arch/x86/Makefile, rest looks very good.
Regards, Carl-Daniel
On 04.02.2008 06:44, ron minnich wrote:
This change moves the geodelxinit code from stage2 to stage1, which in turn gets cache turned on much sooner. The system boots a bit faster.
We're still far too slow, perhaps because we are not caching ROM?
Index: arch/x86/Makefile Add ../../northbridge/amd/geodelx/geodelxinit.o object Index: arch/x86/geodelx/geodelxinit.c add sizeram function. Index: arch/x86/geodelx/stage1.c add called to northbridge_init_early() Index: northbridge/amd/geodelx/Makefile remove geodelxinit.o object remove sizeram function. Index: northbridge/amd/geodelx/geodelx.c remove call to northbridge_init_early()
Compile tested, no new warnings, no new errors. Acked-by: Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net
Index: northbridge/amd/geodelx/geodelx.c
--- northbridge/amd/geodelx/geodelx.c (revision 571) +++ northbridge/amd/geodelx/geodelx.c (working copy) @@ -142,49 +142,6 @@ };
/**
- Size up ram.
- All we need to do here is read the MSR for DRAM and grab out the sizing
- bits. Note that this code depends on initram having run. It uses the MSRs,
- not the SPDs, and the MSRs of course are set up by initram.
- @return TODO
- */
-int sizeram(void) -{
- struct msr msr;
- int sizem = 0;
- u32 dimm;
- /* Get the RAM size from the memory controller as calculated
* and set by auto_size_dimm().
*/
- msr = rdmsr(MC_CF07_DATA);
- printk(BIOS_DEBUG, "sizeram: _MSR MC_CF07_DATA: %08x:%08x\n", msr.hi,
msr.lo);
- /* DIMM 0 */
- dimm = msr.hi;
- /* Installed? */
- if ((dimm & 7) != 7) {
/* 1:8MB, 2:16MB, 3:32MB, 4:64MB, ... 7:512MB, 8:1GB */
sizem = 4 << ((dimm >> 12) & 0x0F);
- }
- /* DIMM 1 */
- dimm = msr.hi >> 16;
- /* Installed? */
- if ((dimm & 7) != 7) {
/* 1:8MB, 2:16MB, 3:32MB, 4:64MB, ... 7:512MB, 8:1GB */
sizem += 4 << ((dimm >> 12) & 0x0F);
- }
- printk(BIOS_DEBUG, "sizeram: sizem 0x%xMB\n", sizem);
- return sizem;
-}
-/**
- Currently not set up.
- @param dev The nortbridge device.
@@ -317,7 +274,7 @@
printk(BIOS_SPEW, ">> Entering northbridge.c: %s\n", __FUNCTION__);
- northbridge_init_early();
+// northbridge_init_early(); chipsetinit();
setup_realmode_idt(); Index: northbridge/amd/geodelx/geodelxinit.c =================================================================== --- northbridge/amd/geodelx/geodelxinit.c (revision 571) +++ northbridge/amd/geodelx/geodelxinit.c (working copy) @@ -25,7 +25,6 @@ #include <amd_geodelx.h>
/* Function prototypes */ -extern int sizeram(void);
struct gliutable { unsigned long desc_name; @@ -142,6 +141,49 @@ }
/**
- Size up ram.
- All we need to do here is read the MSR for DRAM and grab out the sizing
- bits. Note that this code depends on initram having run. It uses the MSRs,
- not the SPDs, and the MSRs of course are set up by initram.
- @return TODO
- */
+int sizeram(void) +{
- struct msr msr;
- int sizem = 0;
- u32 dimm;
- /* Get the RAM size from the memory controller as calculated
* and set by auto_size_dimm().
*/
- msr = rdmsr(MC_CF07_DATA);
- printk(BIOS_DEBUG, "sizeram: _MSR MC_CF07_DATA: %08x:%08x\n", msr.hi,
msr.lo);
- /* DIMM 0 */
- dimm = msr.hi;
- /* Installed? */
- if ((dimm & 7) != 7) {
/* 1:8MB, 2:16MB, 3:32MB, 4:64MB, ... 7:512MB, 8:1GB */
sizem = 4 << ((dimm >> 12) & 0x0F);
- }
- /* DIMM 1 */
- dimm = msr.hi >> 16;
- /* Installed? */
- if ((dimm & 7) != 7) {
/* 1:8MB, 2:16MB, 3:32MB, 4:64MB, ... 7:512MB, 8:1GB */
sizem += 4 << ((dimm >> 12) & 0x0F);
- }
- printk(BIOS_DEBUG, "sizeram: sizem 0x%xMB\n", sizem);
- return sizem;
+}
+/**
- Set up the system memory registers, i.e. memory that can be used
- for non-VSM (or SMM) purposes.
Index: northbridge/amd/geodelx/Makefile
--- northbridge/amd/geodelx/Makefile (revision 571) +++ northbridge/amd/geodelx/Makefile (working copy) @@ -22,7 +22,6 @@ ifeq ($(CONFIG_NORTHBRIDGE_AMD_GEODELX),y)
STAGE2_CHIPSET_OBJ += $(obj)/northbridge/amd/geodelx/geodelx.o \
$(obj)/northbridge/amd/geodelx/geodelxinit.o \ $(obj)/northbridge/amd/geodelx/vsmsetup.o
endif Index: arch/x86/geodelx/stage1.c =================================================================== --- arch/x86/geodelx/stage1.c (revision 571) +++ arch/x86/geodelx/stage1.c (working copy) @@ -55,6 +55,7 @@ */ void disable_car(void) {
extern void northbridge_init_early(void); int i;
for (i = 0; i < ARRAY_SIZE(msr_table); i++)
@@ -67,5 +68,6 @@ __asm__ __volatile__("cld; rep movsl" ::"D" (DCACHE_RAM_BASE), "S" (DCACHE_RAM_BASE), "c" (DCACHE_RAM_SIZE/4): "memory"); __asm__ __volatile__ ("wbinvd\n"); banner(BIOS_DEBUG, "Disable_car: done wbinvd");
- northbridge_init_early(); banner(BIOS_DEBUG, "disable_car: done");
} Index: arch/x86/Makefile =================================================================== --- arch/x86/Makefile (revision 571) +++ arch/x86/Makefile (working copy) @@ -117,6 +117,7 @@ ifeq ($(CONFIG_CPU_AMD_GEODELX),y) STAGE0_CAR_OBJ = geodelx/stage0.o STAGE0_ARCH_X86_OBJ += geodelx/stage1.o
- STAGE0_ARCH_X86_OBJ += ../..//northbridge/amd/geodelx/geodelxinit.o
Shouldn't this have only one slash before northbridge, like this: + STAGE0_ARCH_X86_OBJ += ../../northbridge/amd/geodelx/geodelxinit.o
else STAGE0_CAR_OBJ = stage0_i586.o endif
On Feb 4, 2008 1:48 AM, Carl-Daniel Hailfinger < c-d.hailfinger.devel.2006@gmx.net> wrote:
Hi Ron,
I really like it. One minor typo in arch/x86/Makefile, rest looks very good.
Compile tested, no new warnings, no new errors. Acked-by: Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net
I fixed the two // Committed revision 572.
ron minnich wrote:
We're still far too slow, perhaps because we are not caching ROM?
Correct. Here is a patch that changes that. BUT, here is a problem here. Due to some cache coherency snoop problems across pci we need the ROM cache properties to be write-serialize + cache disabled.
My suggestion is to reset the ROM cache properties in stage2 once code is executing in memory. The problem is that stage2 returns to stage1 to load the payload. v3 would be slow again trying to decompress the payload.
I don't have any good ideas to fix this yet. I don't know if any other systems will have similar problems with going back to the ROM to load payloads. Most BIOS "shadow" and this isn't an issues. It isn't an issue in v2 because the payload load code is in memory.
Marc
Why can't we shadow?
If we never write to the ROM space, at all, why not put the changed cache settings in right BEFORE we jump to the payload? i.e. don't change it to "slow" mode until we no longer need it cached?
thanks
ron
ron minnich wrote:
Why can't we shadow?
We could.
If we never write to the ROM space, at all, why not put the changed cache settings in right BEFORE we jump to the payload? i.e. don't change it to "slow" mode until we no longer need it cached?
I think that this would be ok. Maybe a generic call pre_payload() before the payload is run. I think that this would be a good genreic call for mainboard customization, like hardware_stage1(). It does mean that there will be duplicate code for each Geode platform.
I was working on a patch but it grows the size of the bootblock. It was only a few bytes too large but I grew it from 16KB to 20KB. I know we don't expect the bootblock to change much but should is be mainboard specific?
I'm still hitting the PCI device problem so I have only tested that it builds and the ROM is being cached. I can't get to the cache disable portion.
Marc
I think this is fine but I'd like to fit it into the "stage" nomenclature, so the numbering fits.
So which stage is this? I am not sure, I feel like the stage numbering never quite got finished :-)
possibly stage 3 is load payload, stage 4 is prepare the machine for a payload, and stage5 is run the payload?
ron
On 05.02.2008 07:31, ron minnich wrote:
I think this is fine but I'd like to fit it into the "stage" nomenclature, so the numbering fits.
So which stage is this? I am not sure, I feel like the stage numbering never quite got finished :-)
possibly stage 3 is load payload, stage 4 is prepare the machine for a payload, and stage5 is run the payload?
Remember that we load stage 2 like a payload, so we'd have execution order stage 0,1,3,4,5,2,3,4,5,realpayload. Stage 2 is also special because it lives completely in RAM, but it calls a lot of functions in bootblock ROM. If we shadow the boot block, most problems should disappear.
One thing I didn't understand: Marc Jones wrote:
Due to some cache coherency snoop problems across pci we need the ROM cache properties to be write-serialize + cache disabled.
The data in ROM doesn't change nor do we write to it. So where exactly do we need write snooping and/or write-serialize?
Regards, Carl-Daniel
On Feb 5, 2008 4:00 AM, Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
Remember that we load stage 2 like a payload, so we'd have execution order stage 0,1,3,4,5,2,3,4,5,realpayload.
Not if the stage definition is in main. I realized some time ago that arch/x86/stage1.c is misnamed (my fault) -- since it alls all the stages.
It's really what we used to call hardwaremain, I wonder if we should just call it coreboot main or something.
Marc, is it really possible to shadow bios if it is up in high memory? Or do we have to shadow fxxxx addresses?
If we have to shadow fxxxx addresses, then we need to do something like this:
assembly First C code calls stage1 disable_car shadow boot block to fxxxx jump to entry point in fxxxx stage 2 payload, run it other code payload jump to payload
or not?
ron
ron minnich wrote:
On Feb 5, 2008 4:00 AM, Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
Remember that we load stage 2 like a payload, so we'd have execution order stage 0,1,3,4,5,2,3,4,5,realpayload.
Not if the stage definition is in main. I realized some time ago that arch/x86/stage1.c is misnamed (my fault) -- since it alls all the stages.
It's really what we used to call hardwaremain, I wonder if we should just call it coreboot main or something.
I guess you can change the stage names and add a stageX for pre_payloaD() but that seems excessive.
Marc, is it really possible to shadow bios if it is up in high memory? Or do we have to shadow fxxxx addresses?
It wouldn't have to be there. It could be anywhere in main memory. After memory init in v2 the main coreboot image is copied to low memory and everything is run from there.
If we have to shadow fxxxx addresses, then we need to do something like this:
assembly First C code calls stage1 disable_car shadow boot block to fxxxx jump to entry point in fxxxx stage 2 payload, run it other code payload jump to payload
Instead of a shadow step I would change stage2 to contain the lar other utilities and payload loader for a couple of reasons. It does make stage 2 bigger but you still copy less to memory in the long run. In addition stage2 is compressed so I expect the growth in the ROM image wouldn't be too bad. Another advantage is that the bootblock(stage1.c) would not change as often since it won't do as much (I had to grow it to add the pre_payload call). Also, shadowing steps can be confusing to follow and that would be right in the middle of stage1.c and would grow the bootblock even more.
I am not saying we have to shadow. I think the patch I sent is a good enough solution for Geode. I don't want Geode to change the architecture and make things more difficult for CPUs that don't have these limitations.
Marc
Carl-Daniel Hailfinger wrote:
On 05.02.2008 07:31, ron minnich wrote:
I think this is fine but I'd like to fit it into the "stage" nomenclature, so the numbering fits.
So which stage is this? I am not sure, I feel like the stage numbering never quite got finished :-)
possibly stage 3 is load payload, stage 4 is prepare the machine for a payload, and stage5 is run the payload?
Remember that we load stage 2 like a payload, so we'd have execution order stage 0,1,3,4,5,2,3,4,5,realpayload. Stage 2 is also special because it lives completely in RAM, but it calls a lot of functions in bootblock ROM. If we shadow the boot block, most problems should disappear.
Since I was working in Stage1.c I assumed it was stage1 and that every other stage is called from stage1.
One thing I didn't understand: Marc Jones wrote:
Due to some cache coherency snoop problems across pci we need the ROM cache properties to be write-serialize + cache disabled.
The data in ROM doesn't change nor do we write to it. So where exactly do we need write snooping and/or write-serialize?
The problem is a transaction depth issue and bottlenecks inside the GX and LX that go across PCI. The conditions are very complicated but it comes down to we need write serialization for writes to PCI. If you look in the data book you can't have write serialization and the cache enabled on a given area. During coreboot we don't have to worry about a write or a PCI bus master so I think we can enable caching the ROM. After coreboot we can't be sure what will happen in the system so we need to set it up to be safe. For example flashrom just clears the write protect bit. If the cache were enabled (no write serialization) and flashrom was writing the ROM we would be in a precarious position. A PCI bus master doing a read or a write that has a hit on a tag would cause enough bottleneck conditions that it might hit the bug. We could change flashrom but that doesn't help other tools. We need to leave the system in a safe state. Also, caching the ROM after it is no longer used doesn't make much sense. So, we need a call just before the payload runs to clean up the system.
Marc
On 05.02.2008 18:23, Marc Jones wrote:
Carl-Daniel Hailfinger wrote:
On 05.02.2008 07:31, ron minnich wrote:
I think this is fine but I'd like to fit it into the "stage" nomenclature, so the numbering fits.
So which stage is this? I am not sure, I feel like the stage numbering never quite got finished :-)
possibly stage 3 is load payload, stage 4 is prepare the machine for a payload, and stage5 is run the payload?
Remember that we load stage 2 like a payload, so we'd have execution order stage 0,1,3,4,5,2,3,4,5,realpayload. Stage 2 is also special because it lives completely in RAM, but it calls a lot of functions in bootblock ROM. If we shadow the boot block, most problems should disappear.
Since I was working in Stage1.c I assumed it was stage1 and that every other stage is called from stage1.
Fully agreed.
One thing I didn't understand: Marc Jones wrote:
Due to some cache coherency snoop problems across pci we need the ROM cache properties to be write-serialize + cache disabled.
The data in ROM doesn't change nor do we write to it. So where exactly do we need write snooping and/or write-serialize?
The problem is a transaction depth issue and bottlenecks inside the GX and LX that go across PCI. The conditions are very complicated but it comes down to we need write serialization for writes to PCI. If you look in the data book you can't have write serialization and the cache enabled on a given area. During coreboot we don't have to worry about a write or a PCI bus master so I think we can enable caching the ROM. After coreboot we can't be sure what will happen in the system so we need to set it up to be safe. For example flashrom just clears the write protect bit. If the cache were enabled (no write serialization) and flashrom was writing the ROM we would be in a precarious position. A PCI bus master doing a read or a write that has a hit on a tag would cause enough bottleneck conditions that it might hit the bug. We could change flashrom but that doesn't help other tools. We need to leave the system in a safe state. Also, caching the ROM after it is no longer used doesn't make much sense. So, we need a call just before the payload runs to clean up the system.
Thanks for the explanation. So this final cleanup need not be applied for stage2, only for the real payload because we don't know what the payload may do.
Regards, Carl-Daniel
Carl-Daniel Hailfinger wrote:
On 05.02.2008 18:23, Marc Jones wrote:
Carl-Daniel Hailfinger wrote:
Thanks for the explanation. So this final cleanup need not be applied for stage2, only for the real payload because we don't know what the payload may do.
Correct. That was what my patch was supposed to do. There is still the question of the best place to call. I put it in mainboard so each platform would have a chance at last minute fixups. In Geode case, it could go in the northbride/cpu code....
Marc
On 05.02.2008 00:27, Marc Jones wrote:
ron minnich wrote:
Why can't we shadow?
We could.
Even at the top of the 4G address range? If that is possible, I'd vote to shadow the bootblock only to save RAM.
Some comments on the patch:
Cache the ROM to speed up stage2 and payload decompression.
Due to some cache coherency snoop problems across PCI GeodeLc needs the ROM cache properties to be write-serialize + cache disabled by runtime. Add pre_payload() call to each mainboard as the final coreboot function before the payload is called.
Signed-off by: Marc Jones marc.jones@amd.com
Index: coreboot-v3/northbridge/amd/geodelx/geodelxinit.c
--- coreboot-v3.orig/northbridge/amd/geodelx/geodelxinit.c 2008-02-04 13:35:52.000000000 -0700 +++ coreboot-v3/northbridge/amd/geodelx/geodelxinit.c 2008-02-04 13:36:12.000000000 -0700 @@ -658,7 +658,7 @@ #define SYSMEM_RCONF_WRITETHROUGH 8 #define DEVRC_RCONF_DEFAULT 0x21 #define ROMBASE_RCONF_DEFAULT 0xFFFC0000 -#define ROMRC_RCONF_DEFAULT 0x25 +#define ROMRC_RCONF_DEFAULT 0x04
Maybe keep the 0x25 as ROMRC_RCONF_SAFE?
/**
- TODO.
Index: coreboot-v3/mainboard/adl/msm800sev/stage1.c
--- coreboot-v3.orig/mainboard/adl/msm800sev/stage1.c 2008-02-04 15:41:14.000000000 -0700 +++ coreboot-v3/mainboard/adl/msm800sev/stage1.c 2008-02-04 15:50:04.000000000 -0700 @@ -53,3 +53,15 @@ cs5536_disable_internal_uart(); w83627hf_enable_serial(0x2e, SERIAL_DEV, SERIAL_IOBASE); }
+void pre_payload(void) +{
- struct msr msr;
- /* Set ROM cache properties for runtime. */
- msr = rdmsr(CPU_RCONF_DEFAULT);
- msr.hi &= ~(0xFF << 24); // clear ROMRC
- msr.hi |= 0x25 << 24; // set WS, CD, WP
Use ROMRC_RCONF_SAFE instead of 0x25?
- wrmsr(CPU_RCONF_DEFAULT, msr);
- banner(BIOS_DEBUG, "pre_payload: done");
+}
pre_payload() is not board specific, but Geode LX-specific. geodelxinit.c may be a more appropriate location.
Regards, Carl-Daniel
Carl-Daniel Hailfinger wrote:
On 05.02.2008 00:27, Marc Jones wrote:
ron minnich wrote:
Why can't we shadow?
We could.
Even at the top of the 4G address range? If that is possible, I'd vote to shadow the bootblock only to save RAM.
Some comments on the patch:
Cache the ROM to speed up stage2 and payload decompression.
Due to some cache coherency snoop problems across PCI GeodeLc needs the ROM cache properties to be write-serialize + cache disabled by runtime. Add pre_payload() call to each mainboard as the final coreboot function before the payload is called.
Signed-off by: Marc Jones marc.jones@amd.com
Index: coreboot-v3/northbridge/amd/geodelx/geodelxinit.c
--- coreboot-v3.orig/northbridge/amd/geodelx/geodelxinit.c 2008-02-04 13:35:52.000000000 -0700 +++ coreboot-v3/northbridge/amd/geodelx/geodelxinit.c 2008-02-04 13:36:12.000000000 -0700 @@ -658,7 +658,7 @@ #define SYSMEM_RCONF_WRITETHROUGH 8 #define DEVRC_RCONF_DEFAULT 0x21 #define ROMBASE_RCONF_DEFAULT 0xFFFC0000 -#define ROMRC_RCONF_DEFAULT 0x25 +#define ROMRC_RCONF_DEFAULT 0x04
Maybe keep the 0x25 as ROMRC_RCONF_SAFE?
ok
/**
- TODO.
Index: coreboot-v3/mainboard/adl/msm800sev/stage1.c
--- coreboot-v3.orig/mainboard/adl/msm800sev/stage1.c 2008-02-04 15:41:14.000000000 -0700 +++ coreboot-v3/mainboard/adl/msm800sev/stage1.c 2008-02-04 15:50:04.000000000 -0700 @@ -53,3 +53,15 @@ cs5536_disable_internal_uart(); w83627hf_enable_serial(0x2e, SERIAL_DEV, SERIAL_IOBASE); }
+void pre_payload(void) +{
- struct msr msr;
- /* Set ROM cache properties for runtime. */
- msr = rdmsr(CPU_RCONF_DEFAULT);
- msr.hi &= ~(0xFF << 24); // clear ROMRC
- msr.hi |= 0x25 << 24; // set WS, CD, WP
Use ROMRC_RCONF_SAFE instead of 0x25?
sure
- wrmsr(CPU_RCONF_DEFAULT, msr);
- banner(BIOS_DEBUG, "pre_payload: done");
+}
pre_payload() is not board specific, but Geode LX-specific. geodelxinit.c may be a more appropriate location.
Correct, but I foresee other non-Geode platforms needing a call just prior to the payload running but I am flexible on this. Do others have an opinion?
Marc
On Feb 5, 2008 5:38 PM, Marc Jones Marc.Jones@amd.com wrote:
Correct, but I foresee other non-Geode platforms needing a call just prior to the payload running but I am flexible on this. Do others have an opinion?
yes. I think we'll need this. And I would like, in the future, to get the stage numbering back because the question of "when does this run relative to other things" was a big point of confusion in v2.
But my ack still stands :-)
ron
On 06.02.2008 02:44, ron minnich wrote:
On Feb 5, 2008 5:38 PM, Marc Jones Marc.Jones@amd.com wrote:
Correct, but I foresee other non-Geode platforms needing a call just prior to the payload running but I am flexible on this. Do others have an opinion?
yes. I think we'll need this. And I would like, in the future, to get the stage numbering back because the question of "when does this run relative to other things" was a big point of confusion in v2.
But my ack still stands :-)
Despite my comments, Marc's patch also is Acked-by: Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net
We still can decide how to do this in a generic-yet-not-redundant way after the patch was committed. And maybe the version Marc commits is already perfect. ;-)
Regards, Carl-Daniel
For completeness here is the patch with the acks. r573 Marc
On 06.02.2008 03:38, Marc Jones wrote:
For completeness here is the patch with the acks. r573
Thanks. Qemu/x86 compilation fixed in r574.
Regards, Carl-Daniel
I see a big improvement in boot time. The only remaining issue is the scan for LAR entries in empty space.
ron
On 06.02.2008 06:52, ron minnich wrote:
I see a big improvement in boot time. The only remaining issue is the scan for LAR entries in empty space.
Great! Can you fill up empty LAR space with any uncompressed(!) file of the right size? That will give us a reasonable estimate of v3 boot time.
I'm very interested in how well v3 compares against v2. The cleanliness of v3 is great, but if it even comes with better speed, we have a really strong selling point.
Regards, Carl-Daniel
On Feb 6, 2008 5:53 AM, Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
I'm very interested in how well v3 compares against v2. The cleanliness of v3 is great, but if it even comes with better speed, we have a really strong selling point.
Now there is a nice simple idea. Will try it tonight, but only after I get FILO to boot ...
ron
On 06.02.2008 02:38, Marc Jones wrote:
Carl-Daniel Hailfinger wrote:
On 05.02.2008 00:27, Marc Jones wrote:
/**
- TODO.
Index: coreboot-v3/mainboard/adl/msm800sev/stage1.c
--- coreboot-v3.orig/mainboard/adl/msm800sev/stage1.c 2008-02-04 15:41:14.000000000 -0700 +++ coreboot-v3/mainboard/adl/msm800sev/stage1.c 2008-02-04 15:50:04.000000000 -0700 @@ -53,3 +53,15 @@ cs5536_disable_internal_uart(); w83627hf_enable_serial(0x2e, SERIAL_DEV, SERIAL_IOBASE); }
+void pre_payload(void) +{
- struct msr msr;
- /* Set ROM cache properties for runtime. */
- msr = rdmsr(CPU_RCONF_DEFAULT);
- msr.hi &= ~(0xFF << 24); // clear ROMRC
- msr.hi |= 0x25 << 24; // set WS, CD, WP
Use ROMRC_RCONF_SAFE instead of 0x25?
sure
Thanks.
- wrmsr(CPU_RCONF_DEFAULT, msr);
- banner(BIOS_DEBUG, "pre_payload: done");
+}
pre_payload() is not board specific, but Geode LX-specific. geodelxinit.c may be a more appropriate location.
Correct, but I foresee other non-Geode platforms needing a call just prior to the payload running but I am flexible on this.
Sorry, I didn't express clearly enough what I wanted. I just tried to avoid duplication if this specific version of pre_payload() in mainboard code and hoped to have the code of this pre_payload() in geodelxinit.c. Maybe rename it to platform_pre_payload() and have it called by pre_payload() inside mainboard specific code. That would give us all the flexibility with minimum code duplication.
Regards, Carl-Daniel
Carl-Daniel Hailfinger wrote:
On 06.02.2008 02:38, Marc Jones wrote:
Carl-Daniel Hailfinger wrote:
pre_payload() is not board specific, but Geode LX-specific. geodelxinit.c may be a more appropriate location.
Correct, but I foresee other non-Geode platforms needing a call just prior to the payload running but I am flexible on this.
Sorry, I didn't express clearly enough what I wanted. I just tried to avoid duplication if this specific version of pre_payload() in mainboard code and hoped to have the code of this pre_payload() in geodelxinit.c. Maybe rename it to platform_pre_payload() and have it called by pre_payload() inside mainboard specific code. That would give us all the flexibility with minimum code duplication.
Regards, Carl-Daniel
Good point. I will fix this. Marc
Carl-Daniel Hailfinger wrote:
Sorry, I can't find such an option in v2 and the svn log also didn't turn up any hits. Could you point me to a specific file and/or revision? I have a (wrong due to varargs for vtxprintf) prototype patch for v3:
Index: lib/console.c
--- lib/console.c (Revision 571) +++ lib/console.c (Arbeitskopie) @@ -7,6 +7,8 @@ int vtxprintf(void (*)(unsigned char, void *arg), void *arg, const char *, va_list);
+#define rdtscl(low) __asm__ __volatile__ ("rdtsc" : "=a" (low) : : "edx")
static int console_loglevel(void) { return CONFIG_DEFAULT_CONSOLE_LOGLEVEL; @@ -34,11 +36,15 @@ { va_list args; int i;
u32 tstamp;
if (msg_level > console_loglevel()) { return 0; }
rdtscl(tstamp);
vtxprintf(console_tx_byte, (void *)0, "[tsc %d] ", tstamp);
va_start(args, fmt); i = vtxprintf(console_tx_byte, (void *)0, fmt, args); va_end(args);
NACK! x86 Assembler code (that only works on x86 cpus with a TSC) in generic C code is an absolute no-go! If this is supposed to go in at any later time please change it.
Stefan
On 04.02.2008 10:24, Stefan Reinauer wrote:
Carl-Daniel Hailfinger wrote:
Sorry, I can't find such an option in v2 and the svn log also didn't turn up any hits. Could you point me to a specific file and/or revision? I have a (wrong due to varargs for vtxprintf) prototype patch for v3:
Index: lib/console.c
=================================================================== --- lib/console.c (Revision 571) +++ lib/console.c (Arbeitskopie) @@ -7,6 +7,8 @@ int vtxprintf(void (*)(unsigned char, void *arg), void *arg, const char *, va_list);
+#define rdtscl(low) __asm__ __volatile__ ("rdtsc" : "=a" (low) : : "edx")
static int console_loglevel(void) { return CONFIG_DEFAULT_CONSOLE_LOGLEVEL; @@ -34,11 +36,15 @@ { va_list args; int i;
u32 tstamp;
if (msg_level > console_loglevel()) { return 0; }
rdtscl(tstamp);
vtxprintf(console_tx_byte, (void *)0, "[tsc %d] ", tstamp);
va_start(args, fmt); i = vtxprintf(console_tx_byte, (void *)0, fmt, args); va_end(args);
NACK! x86 Assembler code (that only works on x86 cpus with a TSC) in generic C code is an absolute no-go! If this is supposed to go in at any later time please change it.
Fully agreed. This was just a debugging hack. I'd love to do this in a generic way. Do you know where/when/whether we had printk timestamps in v2?
Regards, Carl-Daniel
we can provide a function, cycles, in the arch-dependent code. cycles returns whatever makes sense for the architecture.
Then this patch can be cleaned up.
thanks
ron
On 04.02.2008 16:45, ron minnich wrote:
we can provide a function, cycles, in the arch-dependent code. cycles returns whatever makes sense for the architecture.
Then this patch can be cleaned up.
What about this? (Varargs stuff still needs to get fixed.)
Regards, Carl-Daniel
Signed-off-by: Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net
Index: include/arch/x86/cpu.h =================================================================== --- include/arch/x86/cpu.h (Revision 571) +++ include/arch/x86/cpu.h (Arbeitskopie) @@ -196,4 +196,23 @@ __asm__ __volatile__("hlt" : : : "memory"); }
+/** + * Retrieve a 32-bit time stamp. The generic get_timestamp() offers no guarantee + * whatsoever about monotony, granularity, absence of wraparounds or linear + * relationship between the time stamp and real time. + * This implementation has wraparounds after 2^32 cycles of the TSC register + * and will usually have a linear dependency on real time. + * Qemu and Geode LX support the rdtsc instruction. + * If v3 ever is used on x86 architectures which don't support rdtsc, we have + * to provide an alternative. + * + * @return 32-bit truncated number of TSC cycles since poweron. + */ +u32 get_timestamp(void) +{ + u32 tstamp; + __asm__ __volatile__ ("rdtsc" : "=a" (tstamp) : : "edx"); + return tstamp; +} + #endif /* ARCH_X86_CPU_H */ Index: lib/console.c =================================================================== --- lib/console.c (Revision 571) +++ lib/console.c (Arbeitskopie) @@ -34,11 +34,16 @@ { va_list args; int i; + u32 tstamp;
if (msg_level > console_loglevel()) { return 0; }
+ tstamp = get_timestamp(); +#error vtxprintf expects varargs, need to fix up + vtxprintf(console_tx_byte, (void *)0, "[tstamp %d] ", tstamp); + va_start(args, fmt); i = vtxprintf(console_tx_byte, (void *)0, fmt, args); va_end(args);
Carl-Daniel Hailfinger wrote:
On 04.02.2008 16:45, ron minnich wrote:
we can provide a function, cycles, in the arch-dependent code. cycles returns whatever makes sense for the architecture.
Then this patch can be cleaned up.
What about this? (Varargs stuff still needs to get fixed.)
What exactly do you intend to fix?
#endif /* ARCH_X86_CPU_H */ Index: lib/console.c =================================================================== --- lib/console.c (Revision 571) +++ lib/console.c (Arbeitskopie) @@ -34,11 +34,16 @@ { va_list args; int i;
u32 tstamp;
if (msg_level > console_loglevel()) { return 0; }
tstamp = get_timestamp();
+#error vtxprintf expects varargs, need to fix up
NACK.
- vtxprintf(console_tx_byte, (void *)0, "[tstamp %d] ", tstamp);
- va_start(args, fmt); i = vtxprintf(console_tx_byte, (void *)0, fmt, args); va_end(args);
On 04.02.2008 18:40, Stefan Reinauer wrote:
Carl-Daniel Hailfinger wrote:
On 04.02.2008 16:45, ron minnich wrote:
we can provide a function, cycles, in the arch-dependent code. cycles returns whatever makes sense for the architecture.
Then this patch can be cleaned up.
What about this? (Varargs stuff still needs to get fixed.)
What exactly do you intend to fix?
vtxprintf expects to get called with va_list. My call to vtxprintf is wrong, but I forgot how to do it the right way.
#endif /* ARCH_X86_CPU_H */ Index: lib/console.c =================================================================== --- lib/console.c (Revision 571) +++ lib/console.c (Arbeitskopie) @@ -34,11 +34,16 @@ { va_list args; int i;
u32 tstamp;
if (msg_level > console_loglevel()) { return 0; }
tstamp = get_timestamp();
+#error vtxprintf expects varargs, need to fix up
NACK.
The #error will not be committed, of course. It is just there because I don't know how to call vtxprintf correctly. Please help.
- vtxprintf(console_tx_byte, (void *)0, "[tstamp %d] ", tstamp);
- va_start(args, fmt); i = vtxprintf(console_tx_byte, (void *)0, fmt, args); va_end(args);
Regards, Carl-Daniel
I would rather not have C code in .h files ... bloats code.
thanks
ron
On 04.02.2008 19:43, ron minnich wrote:
I would rather not have C code in .h files ... bloats code.
The header file in question is a collection of a dozen C functions, all of them rather small. Do we want to clean up the file to resemble a real header file?
Regards, Carl-Daniel
On Feb 4, 2008 11:03 AM, Carl-Daniel Hailfinger < c-d.hailfinger.devel.2006@gmx.net> wrote:
On 04.02.2008 19:43, ron minnich wrote:
I would rather not have C code in .h files ... bloats code.
The header file in question is a collection of a dozen C functions, all of them rather small. Do we want to clean up the file to resemble a real header file?
Let's put cleaning it up on the list of TODOs, but at the same time let's not grow it.
I don't want to change things too much now that we're almost booting.
ron