-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hello,
While thinking how to implement the resume without memory hole I came cross following piece of code in AMD CAR:
set_init_ram_access(); /* So we can access RAM from [1M, CONFIG_RAMTOP) */
print_debug("Copying data from cache to RAM -- switching to use RAM as stack... ");
memcopy((void *)((CONFIG_RAMTOP)-CONFIG_DCACHE_RAM_SIZE), (void *)CONFIG_DCACHE_RAM_BASE, CONFIG_DCACHE_RAM_SIZE);
But! the memory is WB too. Therefore the memcopy _must_ evict some L1 to where? Please note that this code is executed still with CAR enabled.
I answered this question using the Perf counters:
Dumping perf counters 00000000 <- Eviction of L2 to system memory (writebacks to system) 00001172 <- Eviction of data L1 to L2 of all previous states (MOESI) 00000b5f <- L1 Data Cache Refills from System 00000000 Copying data from cache to RAM -- switching to use RAM as stack... Dumping perf counters II 00000000 0000120b 00000b6b 00000000
With bit of different counters:
Dumping perf counters 00000039 <- L2 fills from L1 000007e2 <- Eviction of data L1 to L2 of all previous states (MOES) - excluding invalid 00000d0f <- L1 Data Cache Refills from System (excluding invalid) 00000000 Copying data from cache to RAM -- switching to use RAM as stack... Dumping perf counters II 00000143 0000080e 00000d1c 00000000
It clearly shows that L2 is used for this kind of things. Was this intended? The L2 I think contains also the cached ROM code... so situation is bit more complicated than one can expect.
The patch for this kind of analysis is attached if anyone is interested.
Oh btw I had to add some clobbers to the inline assembly or GCC will not thing the content of ECX has been changed by inline assembly...
Rudolf
It clearly shows that L2 is used for this kind of things. Was this intended? The L2 I think contains also the cached ROM code... so situation is bit more complicated than one can expect.
Thanks for the analysis. I can see it being useful for other things too.
My understanding was that CAR refers to L2. As long as nothing gets replaced from the L2, everything is as it should be. ROM contents can always be fetched again, so that's not critical for correctness.
Thanks, Myles
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Thanks for the analysis. I can see it being useful for other things too.
Ok.
My understanding was that CAR refers to L2. As long as nothing gets replaced from the L2, everything is as it should be. ROM contents can always be fetched again, so that's not critical for correctness.
This is OK, but L2 CAR is in more detail described in fam11h otherwise AMD always just speaks about L1 CAR. the fam11h needs some extra tweaks to various MSR to disables speculative fills if I remember correctly. This is the reason why I see this a bit dangerous, perhaps older CPUs needs this too.
I think we should mark the XIP region as WP instead of WB (check the fam11h BKDG).
Anyway - I tried with UC copy looks like it is not so slow...
I have in works the patch for the register clobber cleanup plus I will do some patch for saving the coreboot mem to resume area... but perhaps on Sunday. Tomorrow bit of skiing, but if you are curious, here is the patch. It just fixes the clobber stuff for the assembly routines, it has bitten me already while dumping the MSRs... the ECX value contained some garbage, and rdmsr did some exception.
The memcpy code is from Linux kernel.
Rudolf
My understanding was that CAR refers to L2. As long as nothing gets replaced from the L2, everything is as it should be. ROM contents can always be fetched again, so that's not critical for correctness.
This is OK, but L2 CAR is in more detail described in fam11h otherwise AMD always just speaks about L1 CAR. the fam11h needs some extra tweaks to various MSR to disables speculative fills if I remember correctly. This is the reason why I see this a bit dangerous, perhaps older CPUs needs this too.
I think we should mark the XIP region as WP instead of WB (check the fam11h BKDG).
I wish I knew more about it. I haven't done much with fam10h or anything with fam11h.
Anyway - I tried with UC copy looks like it is not so slow...
I have in works the patch for the register clobber cleanup plus I will do some patch for saving the coreboot mem to resume area... but perhaps on Sunday. Tomorrow bit of skiing, but if you are curious, here is the patch. It just fixes the clobber stuff for the assembly routines, it has bitten me already while dumping the MSRs... the ECX value contained some garbage, and rdmsr did some exception.
Good catch!
Myles