[coreboot] serengeti_cheetah_fam10: Erratas triple fault in SimNow

Thu Mar 20 22:21:08 CET 2008

Hi Bernhard,

As you can see we are still working on the Barcelona code. There is a 
lot to debug. I have not tested it against SimNow, only real hardware. I 
will work on adding SimNow to our(my) testing. The public version may 
not have some of these MSR implemented. I will have to look into it 
more. I put some comments inline -

Bernhard Kaindl wrote:
> Hi,
>    Are you saying SimNow itself segfaults on you or is it coreboot
> which triple-faults inside SimNow?
>
> Maybe this is something different:
>
> I recently investigated why
>
> src/mainboard/amd/serengeti_cheetah_fam10/cache_as_ram_auto.c
>
> causes triple-faults inside the publically available SimNow
> (the non-NDA version) and here is one of the causes:
>
>         /* FIXME: Check CPU revision to apply correct erratas */
>         /* Rev B errata */
>         /* Errata #169 - supercedes errata #131 */
>         msr = rdmsr(0xC001001F);
>         msr.hi |= 1 << (32 - 32);
>         wrmsr(0xC001101F, msr);
>
> This set a different bit in a different MSR as it is indicated
> in Errata #169. To apply the Errata as indicated in the public
> document, a change like this is needed:
>
>         msr = rdmsr(0xC001001F);
> -       msr.hi |= 1 << (32 - 32);
> +       msr.hi |= 1 << 32;
> -       wrmsr(0xC001101F, msr);
> +       wrmsr(0xC001001F, msr);
>
>

msr.hi is bits 63:32, msr.lo is 31:0. Your shift of 32 pushes the bit 
off the end so 0 is being or'd onto msr.hi.

> The current code reads the correct MSR, sets a different bit
> (bit 0 instead of 32), and write the changed value to a private,
> undocumented or even non-existing MSR, or maybe it's a typo.
>
> Sadly, bit 32 of 0xC001001F is also undocumented AFAICS, but
> Errata #169 says that it should be set. However, that errata
> was later updated to suggest that also as register in the north
> bridge must be changed and I didn't find that part of the errata
> in coreboot yet.
>

So, erratta #169 has been removed from the latest document so it will be 
coming out of the code anyway.

> With that change (I guess it's a fix) SimNow executes this code
> but triples on the next errata implementation:
>
>         /* Errata #202 [DIS_PIGGY_BACK_SCRUB]=1 */
>         msr = rdmsr(0xC0011022);
>         msr.hi |= 1 << 24;
>         wrmsr(0xC0010022, msr);
>
> Again, this applies the changed MSR value to a different MSR
> which is also undocumented or even non-existing(or typo). I also did
> not manage to find any information in Errata #202, so I guess
> it applies to AMD engineering samples only?
>
> I have no suggestion on how to fix that part as I could not
> find any documentaiton on it.
>
> ...
>
This errata has also been removed so we will remove it from coreboot.

> I also think that applying erratas which are not essental to have
> in the very earlyest boot stage should not neccesarily reside
> inside the mainboard-specific cache_as_ram_auto.c but moved to
> a place in the compessed coreboot code where different boards
> can share errata implementations for the CPUs which they suppport.
>
This would be ideal if it could be int he compressed code but it is very 
difficult to tell if an errata will be hit in early in initialization. 
Also, many errata require the soft reset to take effect. As the comments 
note I really didn't want that code in cache_as_ram_auto.c and I am 
working on moving it to the generic CPU code.

> When everyhing is set up, exceptions from wrmsr could also be
> handled better (I guess) than causing triple faults. Linux has
> wrmsr() functions with exception handling in include/asm-x86/msr.h
> which give a proper return code and do not crash the code.
>
There really isn't any exception handling in coreboot but it would be 
something to consider.
> netbsd has a very nice structure for that in place in which you
> can enter erratas simply by adding an entry in a table in which
> you specify for which CPU which errata shall be applied:
>
> http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/x86/x86/errata.c?rev=1.13&content-type=text/x-cvsweb-markup 
>
>
Yes, the new code I am working is table and revision driven.

Thanks,
Marc

> Bernhard
>
> On Thu, 20 Mar 2008, Marc Karasek wrote:
>
>> I have gotten some cycles and they removed our proxy server
>> (hurrah!!!).  So I recompiled the BIOS fro SimNOW using buildrom on my
>> test machine.
>>
>> When I went to run SimNOW is Seg Faults.  I tried it with the default
>> BIOS image for the Cheetah BSD, I also tried one of the other BSDs.  All
>> of them Seg Fault. :-(
>>
>> I made the mistake of updating Fedora8_64 with the latest RPMs.   Lesson
>> learned, if it ain't broke don't fix it...
>>
>> Does anyone have any idea what, I am guessing, package could be causing
>> this?  I have tried with both kernels that are on the machine, with no
>> success.  It did work at one point,  before the update.  I can nuke the
>> box and reinstall f8_64, but would rather not.
>>
>> -- 
>> *********************
>> Marc Karasek
>> MTS
>> Sun Microsystems
>> mailto:marc.karasek at sun.com
>> ph:770.360.6415
>> *********************
>

-- 
Marc Jones
Senior Firmware Engineer
(970) 226-9684 Office
mailto:Marc.Jones at amd.com
http://www.amd.com/embeddedprocessors