AMD SimNOW Seg Fault

List overview All Threads
Download

newer

older

help

[Fwd: Re: Via VT82C686 (A/B)...

Marc Karasek

20 Mar 2008 20 Mar '08

1:33 p.m.

I have gotten some cycles and they removed our proxy server (hurrah!!!). So I recompiled the BIOS fro SimNOW using buildrom on my test machine.

When I went to run SimNOW is Seg Faults. I tried it with the default BIOS image for the Cheetah BSD, I also tried one of the other BSDs. All of them Seg Fault. :-(

I made the mistake of updating Fedora8_64 with the latest RPMs. Lesson learned, if it ain't broke don't fix it...

Does anyone have any idea what, I am guessing, package could be causing this? I have tried with both kernels that are on the machine, with no success. It did work at one point, before the update. I can nuke the box and reinstall f8_64, but would rather not.

-- ********************* Marc Karasek MTS Sun Microsystems mailto:marc.karasek@sun.com ph:770.360.6415 *********************

Show replies by date

ron minnich

20 Mar 20 Mar

1:40 p.m.

My first guess would be thread libraries, does it use them?

ron

Marc Karasek

2:34 p.m.

I am assuming it does, it depends on the Kernel/Processor for some of the Simulation. Hence the need for a 64 bit OS and a AMD Processor.

If I do not hear anything in a day or so I am going to nuke the box back to ground zero, which I know worked.. :-)

********************* Marc Karasek MTS Sun Microsystems mailto:marc.karasek@sun.com ph:770.360.6415 *********************

ron minnich wrote:

...

My first guess would be thread libraries, does it use them?

ron

ron minnich

2:41 p.m.

I would do an RPM inventory before nuking.

rpm -q -a and put it somewhere.

Such fun.

ron

Bernhard Kaindl

2:46 p.m.

New subject: serengeti_cheetah_fam10: Erratas triple fault in SimNow (was: AMD SimNOW Seg Fault)

Hi, Are you saying SimNow itself segfaults on you or is it coreboot which triple-faults inside SimNow?

Maybe this is something different:

I recently investigated why

src/mainboard/amd/serengeti_cheetah_fam10/cache_as_ram_auto.c

causes triple-faults inside the publically available SimNow (the non-NDA version) and here is one of the causes:

/* FIXME: Check CPU revision to apply correct erratas */ /* Rev B errata */ /* Errata #169 - supercedes errata #131 */ msr = rdmsr(0xC001001F); msr.hi |= 1 << (32 - 32); wrmsr(0xC001101F, msr);

This set a different bit in a different MSR as it is indicated in Errata #169. To apply the Errata as indicated in the public document, a change like this is needed:

msr = rdmsr(0xC001001F); - msr.hi |= 1 << (32 - 32); + msr.hi |= 1 << 32; - wrmsr(0xC001101F, msr); + wrmsr(0xC001001F, msr);

The current code reads the correct MSR, sets a different bit (bit 0 instead of 32), and write the changed value to a private, undocumented or even non-existing MSR, or maybe it's a typo.

Sadly, bit 32 of 0xC001001F is also undocumented AFAICS, but Errata #169 says that it should be set. However, that errata was later updated to suggest that also as register in the north bridge must be changed and I didn't find that part of the errata in coreboot yet.

With that change (I guess it's a fix) SimNow executes this code but triples on the next errata implementation:

/* Errata #202 [DIS_PIGGY_BACK_SCRUB]=1 */ msr = rdmsr(0xC0011022); msr.hi |= 1 << 24; wrmsr(0xC0010022, msr);

Again, this applies the changed MSR value to a different MSR which is also undocumented or even non-existing(or typo). I also did not manage to find any information in Errata #202, so I guess it applies to AMD engineering samples only?

I have no suggestion on how to fix that part as I could not find any documentaiton on it.

...

I also think that applying erratas which are not essental to have in the very earlyest boot stage should not neccesarily reside inside the mainboard-specific cache_as_ram_auto.c but moved to a place in the compessed coreboot code where different boards can share errata implementations for the CPUs which they suppport.

When everyhing is set up, exceptions from wrmsr could also be handled better (I guess) than causing triple faults. Linux has wrmsr() functions with exception handling in include/asm-x86/msr.h which give a proper return code and do not crash the code.

netbsd has a very nice structure for that in place in which you can enter erratas simply by adding an entry in a table in which you specify for which CPU which errata shall be applied:

http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/x86/x86/errata.c?rev=1.13&a...

Bernhard

On Thu, 20 Mar 2008, Marc Karasek wrote:

...

I have gotten some cycles and they removed our proxy server (hurrah!!!). So I recompiled the BIOS fro SimNOW using buildrom on my test machine.

When I went to run SimNOW is Seg Faults. I tried it with the default BIOS image for the Cheetah BSD, I also tried one of the other BSDs. All of them Seg Fault. :-(

I made the mistake of updating Fedora8_64 with the latest RPMs. Lesson learned, if it ain't broke don't fix it...

Does anyone have any idea what, I am guessing, package could be causing this? I have tried with both kernels that are on the machine, with no success. It did work at one point, before the update. I can nuke the box and reinstall f8_64, but would rather not.

--

Marc Karasek MTS Sun Microsystems mailto:marc.karasek@sun.com ph:770.360.6415

Marc Jones

3:21 p.m.

New subject: serengeti_cheetah_fam10: Erratas triple fault in SimNow

Hi Bernhard,

As you can see we are still working on the Barcelona code. There is a lot to debug. I have not tested it against SimNow, only real hardware. I will work on adding SimNow to our(my) testing. The public version may not have some of these MSR implemented. I will have to look into it more. I put some comments inline -

Bernhard Kaindl wrote:

...

Hi, Are you saying SimNow itself segfaults on you or is it coreboot which triple-faults inside SimNow?

Maybe this is something different:

I recently investigated why

src/mainboard/amd/serengeti_cheetah_fam10/cache_as_ram_auto.c

causes triple-faults inside the publically available SimNow (the non-NDA version) and here is one of the causes:
    /* FIXME: Check CPU revision to apply correct erratas */
    /* Rev B errata */
    /* Errata #169 - supercedes errata #131 */
    msr = rdmsr(0xC001001F);
    msr.hi |= 1 << (32 - 32);
    wrmsr(0xC001101F, msr);
This set a different bit in a different MSR as it is indicated in Errata #169. To apply the Errata as indicated in the public document, a change like this is needed:
    msr = rdmsr(0xC001001F);
  msr.hi |= 1 << (32 - 32);
  msr.hi |= 1 << 32;
  wrmsr(0xC001101F, msr);
  wrmsr(0xC001001F, msr);

msr.hi is bits 63:32, msr.lo is 31:0. Your shift of 32 pushes the bit off the end so 0 is being or'd onto msr.hi.

...

The current code reads the correct MSR, sets a different bit (bit 0 instead of 32), and write the changed value to a private, undocumented or even non-existing MSR, or maybe it's a typo.

Sadly, bit 32 of 0xC001001F is also undocumented AFAICS, but Errata #169 says that it should be set. However, that errata was later updated to suggest that also as register in the north bridge must be changed and I didn't find that part of the errata in coreboot yet.

So, erratta #169 has been removed from the latest document so it will be coming out of the code anyway.

...

With that change (I guess it's a fix) SimNow executes this code but triples on the next errata implementation:
    /* Errata #202 [DIS_PIGGY_BACK_SCRUB]=1 */
    msr = rdmsr(0xC0011022);
    msr.hi |= 1 << 24;
    wrmsr(0xC0010022, msr);
Again, this applies the changed MSR value to a different MSR which is also undocumented or even non-existing(or typo). I also did not manage to find any information in Errata #202, so I guess it applies to AMD engineering samples only?

I have no suggestion on how to fix that part as I could not find any documentaiton on it.

...

This errata has also been removed so we will remove it from coreboot.

...

I also think that applying erratas which are not essental to have in the very earlyest boot stage should not neccesarily reside inside the mainboard-specific cache_as_ram_auto.c but moved to a place in the compessed coreboot code where different boards can share errata implementations for the CPUs which they suppport.

This would be ideal if it could be int he compressed code but it is very difficult to tell if an errata will be hit in early in initialization. Also, many errata require the soft reset to take effect. As the comments note I really didn't want that code in cache_as_ram_auto.c and I am working on moving it to the generic CPU code.

...

When everyhing is set up, exceptions from wrmsr could also be handled better (I guess) than causing triple faults. Linux has wrmsr() functions with exception handling in include/asm-x86/msr.h which give a proper return code and do not crash the code.

There really isn't any exception handling in coreboot but it would be something to consider.

...

netbsd has a very nice structure for that in place in which you can enter erratas simply by adding an entry in a table in which you specify for which CPU which errata shall be applied:

http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/x86/x86/errata.c?rev=1.13&a...

Yes, the new code I am working is table and revision driven.

Thanks, Marc

...

Bernhard

On Thu, 20 Mar 2008, Marc Karasek wrote:

...
I have gotten some cycles and they removed our proxy server (hurrah!!!). So I recompiled the BIOS fro SimNOW using buildrom on my test machine.

When I went to run SimNOW is Seg Faults. I tried it with the default BIOS image for the Cheetah BSD, I also tried one of the other BSDs. All of them Seg Fault. :-(

I made the mistake of updating Fedora8_64 with the latest RPMs. Lesson learned, if it ain't broke don't fix it...

Does anyone have any idea what, I am guessing, package could be causing this? I have tried with both kernels that are on the machine, with no success. It did work at one point, before the update. I can nuke the box and reinstall f8_64, but would rather not.

--

Marc Karasek MTS Sun Microsystems mailto:marc.karasek@sun.com ph:770.360.6415

-- Marc Jones Senior Firmware Engineer (970) 226-9684 Office mailto:Marc.Jones@amd.com http://www.amd.com/embeddedprocessors

Bernhard Kaindl

3:48 p.m.

New subject: serengeti_cheetah_fam10: Erratas triple fault in SimNow

On Thu, 20 Mar 2008, Marc Jones wrote:

...

Hi Bernhard,

As you can see we are still working on the Barcelona code. There is a lot to debug. I have not tested it against SimNow, only real hardware. I will work on adding SimNow to our(my) testing. The public version may not have some of these MSR implemented. I will have to look into it more. I put some comments inline -

Thanks, I'm glad for all the information which was in your response, and I am also happy to head that the two erratas have been removed from the latest documents and that they will be removed.

Will there be continous drops of the updates you are doing (like removing the two removed erratas) or are you limited by legal review in a way which means you can only release code in longer time intervals?

Thanks, Bernhard

Marc Jones

4 p.m.

New subject: serengeti_cheetah_fam10: Erratas triple fault in SimNow

Bernhard Kaindl wrote:

...

On Thu, 20 Mar 2008, Marc Jones wrote:

...
Hi Bernhard,

As you can see we are still working on the Barcelona code. There is a lot to debug. I have not tested it against SimNow, only real hardware. I will work on adding SimNow to our(my) testing. The public version may not have some of these MSR implemented. I will have to look into it more. I put some comments inline -

Thanks, I'm glad for all the information which was in your response, and I am also happy to head that the two erratas have been removed from the latest documents and that they will be removed.

Will there be continous drops of the updates you are doing (like removing the two removed erratas) or are you limited by legal review in a way which means you can only release code in longer time intervals?

I try to keep up as best I can. The release interval is limited by my ability to get everything done. It is difficult at this phase in the product cycle but we are working on ways to improve. This will be one of the topics at the summit.

Marc

-- Marc Jones Senior Firmware Engineer (970) 226-9684 Office mailto:Marc.Jones@amd.com http://www.amd.com/embeddedprocessors

Jordan Crouse

21 Mar 21 Mar

11:19 a.m.

On 20/03/08 15:33 -0400, Marc Karasek wrote:

...

I have gotten some cycles and they removed our proxy server (hurrah!!!). So I recompiled the BIOS fro SimNOW using buildrom on my test machine.

When I went to run SimNOW is Seg Faults. I tried it with the default BIOS image for the Cheetah BSD, I also tried one of the other BSDs. All of them Seg Fault. :-(

I made the mistake of updating Fedora8_64 with the latest RPMs. Lesson learned, if it ain't broke don't fix it...

Does anyone have any idea what, I am guessing, package could be causing this? I have tried with both kernels that are on the machine, with no success. It did work at one point, before the update. I can nuke the box and reinstall f8_64, but would rather not.

See if you can get a core dump and send it to me with the details of your system - I'll send it to the SimNow team to see if they can see whats up.

Also, you can try to send me your ROM and I'll see if I can break it.

Jordan

-- Jordan Crouse Systems Software Development Engineer Advanced Micro Devices, Inc.

Marc Karasek

24 Mar 24 Mar

10:31 a.m.

Jordan,

Thanks for the offer. I was already in touch with the SimNOW group w/i AMD. (Trying all avenues to find a fix)

Turns out it was the max_map_count value being set to low. John Slice recommended a value of 8388608 and this works. I have changed my sysctl.conf to set this on boot.

Now for the next issue, LAB is failing on linuxrc[1] trap divide error. I saw this before and have booted (need to reverify) FILO successfully. I had narrowed it down to the Linux and not coreboot, need to debug this further to find the problem...

Marc

********************* Marc Karasek MTS Sun Microsystems mailto:marc.karasek@sun.com ph:770.360.6415 *********************

Jordan Crouse wrote:

...

On 20/03/08 15:33 -0400, Marc Karasek wrote:

...
I have gotten some cycles and they removed our proxy server (hurrah!!!). So I recompiled the BIOS fro SimNOW using buildrom on my test machine.

When I went to run SimNOW is Seg Faults. I tried it with the default BIOS image for the Cheetah BSD, I also tried one of the other BSDs. All of them Seg Fault. :-(

I made the mistake of updating Fedora8_64 with the latest RPMs. Lesson learned, if it ain't broke don't fix it...

Does anyone have any idea what, I am guessing, package could be causing this? I have tried with both kernels that are on the machine, with no success. It did work at one point, before the update. I can nuke the box and reinstall f8_64, but would rather not.

See if you can get a core dump and send it to me with the details of your system - I'll send it to the SimNow team to see if they can see whats up.

Also, you can try to send me your ROM and I'll see if I can break it.

Jordan

6257

days inactive

6261

days old

coreboot@coreboot.org

9 comments

5 participants

tags (0)

participants (5)

Bernhard Kaindl
Jordan Crouse
Marc Jones
Marc Karasek
ron minnich