Add an option to make compression of ramstage configurable. Right now it is always compressed. On my Thinkpad, the complete boot to grub takes 4s, with around 1s required for decompressing ramstage. This is probably caused by the fact the decompression does a lot of single byte/word/qword accesses, which are really slow on SPI buses. So give the user the option to store ramstage uncompressed, if he has enough memory.
Signed-off-by: Sven Schnelle svens@stackframe.org --- Makefile.inc | 4 ++++ src/Kconfig | 8 ++++++++ 2 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/Makefile.inc b/Makefile.inc index 6c4a16a..6267539 100644 --- a/Makefile.inc +++ b/Makefile.inc @@ -85,7 +85,11 @@ cbfs-files-handler= \
####################################################################### # a variety of flags for our build +CBFS_COMPRESS_FLAG:= +ifeq ($(CONFIG_COMPRESS_RAMSTAGE),y) CBFS_COMPRESS_FLAG:=l +endif + CBFS_PAYLOAD_COMPRESS_FLAG:= CBFS_PAYLOAD_COMPRESS_NAME:=none ifeq ($(CONFIG_COMPRESSED_PAYLOAD_LZMA),y) diff --git a/src/Kconfig b/src/Kconfig index 76e77f8..a782315 100644 --- a/src/Kconfig +++ b/src/Kconfig @@ -98,6 +98,14 @@ config USE_OPTION_TABLE Enable this option if coreboot shall read options from the "CMOS" NVRAM instead of using hard coded values.
+config COMPRESS_RAMSTAGE + bool "Compress ramstage with LZMA" + default y + help + Compress ramstage to save memory in the flash image. Note + that decompression might slow down booting if the BIOS flash + is connected through a slow Link (i.e. SPI) + endmenu
source src/mainboard/Kconfig
Am 02.05.2011 16:13, schrieb Sven Schnelle:
Add an option to make compression of ramstage configurable. Right now it is always compressed. On my Thinkpad, the complete boot to grub takes 4s, with around 1s required for decompressing ramstage. This is probably caused by the fact the decompression does a lot of single byte/word/qword accesses, which are really slow on SPI buses. So give the user the option to store ramstage uncompressed, if he has enough memory.
Signed-off-by: Sven Schnelle svens@stackframe.org
Acked-by: Patrick Georgi patrick@georgi-clan.de
Sven Schnelle wrote:
]Add an option to make compression of ramstage configurable. Right now ]it is always compressed. On my Thinkpad, the complete boot to grub takes ]4s, with around 1s required for decompressing ramstage. This is probably ]caused by the fact the decompression does a lot of single byte/word/qword ]accesses, which are really slow on SPI buses. So give the user the option ]to store ramstage uncompressed, if he has enough memory. ] ]Signed-off-by: Sven Schnelle svens@stackframe.org
Hello Sven,
Thanks, I like having this option. For AMD Persimmon I get these boot times for coreboot+seabios+dos ssd drive:
Standard No compress SPI 33 MHz fast mode, prefetch 0.690 0.717 SPI 33 MHz fast mode, no prefetch 0.933 1.041 AMD Simnow 9.0 3.0
For this project compress disable slows booting on real hardware slightly, but it vastly improves simnow boot time.
Is your SB SPI prefetch enabled? Using the cycle logging feature of DediProg EM100 shows that for AMD, SPI reads are all dwords until the SB SPI prefetch is enabled, at which time they become cache line reads.
Thanks, Scott
* Sven Schnelle svens@stackframe.org [110502 16:13]:
Add an option to make compression of ramstage configurable. Right now it is always compressed. On my Thinkpad, the complete boot to grub takes 4s, with around 1s required for decompressing ramstage. This is probably caused by the fact the decompression does a lot of single byte/word/qword accesses, which are really slow on SPI buses. So give the user the option to store ramstage uncompressed, if he has enough memory.
Hi Sven,
can you try whether your thinkpad can boot faster if you enable spi prefetching in src/southbridge/intel/i82801gx/bootblock.c
i.e.
static void enable_spi_prefetch(void) { u8 reg8; device_t dev;
dev = PCI_DEV(0, 0x1f, 0);
reg8 = pci_read_config8(dev, 0xdc); reg8 &= ~(3 << 2); reg8 |= (2 << 2); /* Prefetching and Caching Enabled */ pci_write_config8(dev, 0xdc, reg8); }
static void bootblock_southbridge_init(void) { ... enable_spi_prefetch(); ... }
Stefan
* Stefan Reinauer stefan.reinauer@coreboot.org [110502 20:34]:
- Sven Schnelle svens@stackframe.org [110502 16:13]:
Add an option to make compression of ramstage configurable. Right now it is always compressed. On my Thinkpad, the complete boot to grub takes 4s, with around 1s required for decompressing ramstage. This is probably caused by the fact the decompression does a lot of single byte/word/qword accesses, which are really slow on SPI buses. So give the user the option to store ramstage uncompressed, if he has enough memory.
Hi Sven,
can you try whether your thinkpad can boot faster if you enable spi prefetching in src/southbridge/intel/i82801gx/bootblock.c
i.e.
static void enable_spi_prefetch(void) { u8 reg8; device_t dev;
dev = PCI_DEV(0, 0x1f, 0); reg8 = pci_read_config8(dev, 0xdc); reg8 &= ~(3 << 2); reg8 |= (2 << 2); /* Prefetching and Caching Enabled */ pci_write_config8(dev, 0xdc, reg8);
}
static void bootblock_southbridge_init(void) { ... enable_spi_prefetch(); ... }
And in addition you need to set up MTRRs correctly by doing something like this: (sorry, wrong CPU type but the code should be fairly similar)
Signed-off-by: Stefan Reinauer stefan.reinauer@coreboot.org
--- src/cpu/intel/model_106cx/cache_as_ram.inc +++ src/cpu/intel/model_106cx/cache_as_ram.inc @@ -195,13 +195,27 @@ clear_mtrrs:
post_code(0x38)
- /* Enable Write Back and Speculative Reads for the first 1MB. */ + /* Enable Write Back and Speculative Reads for the first MB + * and coreboot_ram. + */ movl $MTRRphysBase_MSR(0), %ecx movl $(0x00000000 | MTRR_TYPE_WRBACK), %eax xorl %edx, %edx wrmsr movl $MTRRphysMask_MSR(0), %ecx - movl $(~(1024 * 1024 - 1) | (1 << 11)), %eax + movl $(~(CONFIG_RAMTOP - 1) | MTRRphysMaskValid), %eax + xorl %edx, %edx + wrmsr + + /* Enable Caching and speculative Reads for the + * complete ROM now that we actually have RAM. + */ + movl $MTRRphysBase_MSR(1), %ecx + movl $(0xffc00000 | MTRR_TYPE_WRPROT), %eax + xorl %edx, %edx + wrmsr + movl $MTRRphysMask_MSR(1), %ecx + movl $(~(4*1024*1024 - 1) | MTRRphysMaskValid), %eax xorl %edx, %edx wrmsr
Stefan Reinauer wrote:
And in addition you need to set up MTRRs correctly by doing something like this: (sorry, wrong CPU type but the code should be fairly similar)
Signed-off-by: Stefan Reinauer stefan.reinauer@coreboot.org
Acked-by: Peter Stuge peter@stuge.se
--- src/cpu/intel/model_106cx/cache_as_ram.inc +++ src/cpu/intel/model_106cx/cache_as_ram.inc @@ -195,13 +195,27 @@ clear_mtrrs:
post_code(0x38)
- /* Enable Write Back and Speculative Reads for the first 1MB. */
- /* Enable Write Back and Speculative Reads for the first MB
* and coreboot_ram.
movl $MTRRphysBase_MSR(0), %ecx movl $(0x00000000 | MTRR_TYPE_WRBACK), %eax xorl %edx, %edx wrmsr movl $MTRRphysMask_MSR(0), %ecx*/
- movl $(~(1024 * 1024 - 1) | (1 << 11)), %eax
- movl $(~(CONFIG_RAMTOP - 1) | MTRRphysMaskValid), %eax
- xorl %edx, %edx
- wrmsr
- /* Enable Caching and speculative Reads for the
* complete ROM now that we actually have RAM.
*/
- movl $MTRRphysBase_MSR(1), %ecx
- movl $(0xffc00000 | MTRR_TYPE_WRPROT), %eax
- xorl %edx, %edx
- wrmsr
- movl $MTRRphysMask_MSR(1), %ecx
- movl $(~(4*1024*1024 - 1) | MTRRphysMaskValid), %eax xorl %edx, %edx wrmsr
Sven Schnelle wrote:
+++ b/src/Kconfig
..
- help
Compress ramstage to save memory in the flash image. Note
that decompression might slow down booting if the BIOS flash
is connected through a slow Link (i.e. SPI)
Please write "boot flash" since there may not be any BIOS.
//Peter
Sven Schnelle svens@stackframe.org writes:
Add an option to make compression of ramstage configurable. Right now it is always compressed. On my Thinkpad, the complete boot to grub takes 4s, with around 1s required for decompressing ramstage. This is probably caused by the fact the decompression does a lot of single byte/word/qword accesses, which are really slow on SPI buses. So give the user the option to store ramstage uncompressed, if he has enough memory.
Odd. Historically this has been solved by simply putting an mtrr over the compressed area. So that you would still get full cache block transfers during the decompression. I am fuzzy about the appropriate mode. Write protect I think.
Have you tried setting up an mtrr over the area that will be decompressed. That should result in something that is even faster than copying non-compressed data.
Eric
* Eric W. Biederman ebiederm@xmission.com [110502 22:00]:
Sven Schnelle svens@stackframe.org writes:
Add an option to make compression of ramstage configurable. Right now it is always compressed. On my Thinkpad, the complete boot to grub takes 4s, with around 1s required for decompressing ramstage. This is probably caused by the fact the decompression does a lot of single byte/word/qword accesses, which are really slow on SPI buses. So give the user the option to store ramstage uncompressed, if he has enough memory.
Odd. Historically this has been solved by simply putting an mtrr over the compressed area. So that you would still get full cache block transfers during the decompression. I am fuzzy about the appropriate mode. Write protect I think.
Have you tried setting up an mtrr over the area that will be decompressed. That should result in something that is even faster than copying non-compressed data.
The problem is that the code hard coded those values assuming coreboot lives in the first 1MB which is not the case anymore since we have SMM handlers.
Stefan
Stefan Reinauer stefan.reinauer@coreboot.org writes:
- Eric W. Biederman ebiederm@xmission.com [110502 22:00]:
Sven Schnelle svens@stackframe.org writes:
Add an option to make compression of ramstage configurable. Right now it is always compressed. On my Thinkpad, the complete boot to grub takes 4s, with around 1s required for decompressing ramstage. This is probably caused by the fact the decompression does a lot of single byte/word/qword accesses, which are really slow on SPI buses. So give the user the option to store ramstage uncompressed, if he has enough memory.
Odd. Historically this has been solved by simply putting an mtrr over the compressed area. So that you would still get full cache block transfers during the decompression. I am fuzzy about the appropriate mode. Write protect I think.
Have you tried setting up an mtrr over the area that will be decompressed. That should result in something that is even faster than copying non-compressed data.
The problem is that the code hard coded those values assuming coreboot lives in the first 1MB which is not the case anymore since we have SMM handlers.
Was that a destination hard code? The code itself should come out of the last couple of megabytes before 4G.
Regardless the performance penalty for not caching is huge fractions of the boot time so whatever small practical issues exist we should figure them out.
Eric
* Eric W. Biederman ebiederm@xmission.com [110503 01:26]:
Was that a destination hard code? The code itself should come out of the last couple of megabytes before 4G.
Yes. Only the lower 1MB of the destination memory was cached, while coreboot's ram stage is now copied to 1MB.
My other mail to the list shows how to fix it.
Hi Stefan, hi Eric,
Stefan Reinauer stefan.reinauer@coreboot.org writes:
- Eric W. Biederman ebiederm@xmission.com [110503 01:26]:
Was that a destination hard code? The code itself should come out of the last couple of megabytes before 4G.
Yes. Only the lower 1MB of the destination memory was cached, while coreboot's ram stage is now copied to 1MB.
thanks for all your help. I've did both changes (enabling SPI prefetch and setting the MTRRs right). Boot time decreased now to 1.8s (with only 1s spent in coreboot). Decompression time is now about 100ms, which is much better than what we had before (1.9s only for ramstage loading).
Thanks,
Sven.