Hi all,
while working on porting an HP DL 145G3 to K10 support with rev C2 Opterons, our team stumbled across the same problem Ward described in his thread. I tried out all errata fixes and the latest microcode patch files - no success. We tracked down the problem to be in the CpuFindCapability function which returns faulty offsets to address the ht phy links. For instance, when link 0 is to be accessed, the function returns A0 as an offset thereby leading function "AMD_checkLinkType" to check for a present and correctly initialized channel on link1. Now, when the AMD_SetHtPhyRegister function hits the function 4 portal of link1, the do/while loop waits for a completion bit to be set on a link that may not even be connected. According to the BKDG, the completion flag will never be set on an uninitialized link, therefore the system hang. Below is a fix of abovementioned function which spits out correct offsets. I am not sure, why this problem did not occur with previous revisions. My guess though is that this defined behaviour is something new to the C2 series. Also, this problem is highly topology dependent. When working with motherboards with all Opteron links properly connected and initialized, the only effect one would see is a missing switch to HT3 frequencies on link0. Hope this helps some people. Best regards,
Maximilian Thuermer
Index: init_cpus.c =================================================================== --- init_cpus.c (revision 4048) +++ init_cpus.c (working copy) @@ -705,37 +705,40 @@ * * Returns the offset of the link register. */ -BOOL AMD_CpuFindCapability (u8 node, u8 cap_count, u8 *offset) +BOOL AMD_CpuFindCapability (u8 node, u8 link_no, u8 *offset) { u32 val; + u8 cap_count = 0;
/* get start of CPU HT Host Capabilities */ val = pci_read_config32(NODE_PCI(node, 0), 0x34); val &= 0xFF;
- cap_count++; - /* Traverse through the capabilities. */ - do { + while((cap_count < link_no) && val) + { val = pci_read_config32(NODE_PCI(node, 0), val); /* Is the capability block a HyperTransport capability block? */ if ((val & 0xFF) == 0x08) /* Is the HT capability block an HT Host Capability? */ if ((val & 0xE0000000) == (1 << 29)) - cap_count--; + cap_count++; val = (val >> 8) & 0xFF; - } while (cap_count && val); + }
*offset = (u8) val;
+ //printk_debug("Offset is 0x%x \n", (u8)val); + /* If requested capability found val != 0 */ - if (!cap_count) + if (val != 0x00) return TRUE; else return FALSE; }
On Mon, Jun 15, 2009 at 11:25 AM, Maximilian ThuermerMaximilian.Thuermer@stud.uni-karlsruhe.de wrote:
Hi all,
while working on porting an HP DL 145G3 to K10 support with rev C2 Opterons, our team stumbled across the same problem Ward described in his thread. I tried out all errata fixes and the latest microcode patch files - no success.
greetings to Karlsruhe :-)
That is quite a nice catch; Marc and Yinghai, any comments here? I do not feel qualified to ACK this patch because I have no opterons to test on just now but possibly Ward can also tell us if it helps things.
Thanks again.
ron
On Mon, Jun 15, 2009 at 01:46:05PM -0700, ron minnich wrote:
On Mon, Jun 15, 2009 at 11:25 AM, Maximilian ThuermerMaximilian.Thuermer@stud.uni-karlsruhe.de wrote:
Hi all,
while working on porting an HP DL 145G3 to K10 support with rev C2 Opterons, our team stumbled across the same problem Ward described in his thread. I tried out all errata fixes and the latest microcode patch files - no success.
greetings to Karlsruhe :-)
That is quite a nice catch; Marc and Yinghai, any comments here? I do not feel qualified to ACK this patch because I have no opterons to test on just now but possibly Ward can also tell us if it helps things.
I will test this evening or tomorrow and let you all know.
Thanks! Ward.
Hi Maximilian,
thanks for your patch. Ward will test it and I hope we can commit it soon.
On 15.06.2009 20:25, Maximilian Thuermer wrote:
while working on porting an HP DL 145G3 to K10 support with rev C2 Opterons, our team stumbled across the same problem Ward described in his thread. I tried out all errata fixes and the latest microcode patch files - no success. We tracked down the problem to be in the CpuFindCapability function which returns faulty offsets to address the ht phy links.
Could you please give us a Signed-off-by statement for your patch? http://www.coreboot.org/Development_Guidelines#Sign-off_Procedure explains the details.
Regards, Carl-Daniel
Hi Maximilian,
Very good catch. I am suprised we didn't see this sooner.
The bug is that the code was always updating the passed value to the next link offset even when it was on the requested link (cap_count). I think that your patch has a slight problem in that it skips the CapID and CapType check on the first link, link_no == 0.
Attached is a slightly different fix (untested). I think that this function could be rewritten to be more clear but this is what you get when tying to keep code similarity when going from asm to C.....
Please review and test.
Signed-off-by: Marc Jones marcj303@gmail.com
Thanks, Marc
Hi Marc,
On Mon, Jun 15, 2009 at 04:14:32PM -0600, Marc Jones wrote:
Very good catch. I am suprised we didn't see this sooner.
The bug is that the code was always updating the passed value to the next link offset even when it was on the requested link (cap_count). I think that your patch has a slight problem in that it skips the CapID and CapType check on the first link, link_no == 0.
Attached is a slightly different fix (untested). I think that this function could be rewritten to be more clear but this is what you get when tying to keep code similarity when going from asm to C.....
Please review and test.
This does not appear to fix the hang for my board. Here's a boot log with lots of register dumping enabled:
http://ward.vandewege.net/coreboot/h8dmr/fam10/h8dmr-ae.cap
And here's one without the very lengthy dumps:
http://ward.vandewege.net/coreboot/h8dmr/fam10/h8dmr-af.cap
I see very little difference with the older dumps.
Thanks, Ward.
Hello, I realise your list is very technical and this is probably a daft set of question so please forgive me if it is cluttering up your list but I couldn't find a more appropriate place to ask,.. ... so here goes.
My questions are about running coreboot on an ASUS KFN4-D16 with an NVIDIA CK804 chipset and a SST SST49LF080A (BIOS?) chip 33-4C-NHE 0631138-B I have 2 cpus one is a 65nm dual core opteron 2210, the second CPU is a quad core 45nm "Shanghai" opteron 2376 - this CPU isn't supported by the ASUS BIOS. My aim is to get the board to boot with the quad core CPU, I would be happy if it boots with support for all the RAM and at least one of the ethernet ports, I can live without PCI, SATA, USB etc.
My plan is to use a BIOS saviour and buy a second SST49LF080A chip and then
1. flash coreboot for K8 (?) with flashrom to verify that coreboot works on this board 2. flash coreboot for fam10 (?) with flashrom
does this sound like a good plan?
I have a couple of other questions
A. does FAM10 refer to 45nm Opterons (eg 2376) B. I read thorugh recent posts on the list and it seems that AMD are actively contributing to coreboot, is this correct? C. It seems that support for the Shanghai processors is not complete but will probably be in the near future, is this correct? D. I am slightly confused by North and South bridge, is the CK804 a southbridge?, do the opteron CPUs provide their own northbridge? E. what sort of BIOS chip / bios saviour kit should I use with this board?
output from flashrom, superiotool and lspci appended below,
many thanks for any help and good luck with your project,
Tom Ward
root@shed:/home/tom/coreboot/flashrom# ./flashrom flashrom v0.9.0-r555 No coreboot table found. Found chipset "NVIDIA CK804", enabling flash write... OK. Calibrating delay loop... OK. Found chip "SST SST49LF080A" (1024 KB) at physical address 0xfff00000.
superiotool r3695 Found Winbond W83627THF/THG (id=0x82, rev=0x84) at 0x2e
00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a4) 00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev f1) 00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2) 00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2) 00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a4) 00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev f3) 00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) 00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) 00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev f2) 00:0b.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev f3) 00:0c.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev f3) 00:0d.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev f3) 00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:00.0 VGA compatible controller: nVidia Corporation G72 [GeForce 7300 LE] (rev a1) 02:00.0 PCI bridge: Intel Corporation 6702PXH PCI Express-to-PCI Bridge A (rev 09) 04:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express (rev 21) 05:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express (rev 21) 06:05.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)
On Tuesday 16 June 2009 11:36:02 Thomas Ward wrote:
Hi Thomas
My questions are about running coreboot on an ASUS KFN4-D16 with an NVIDIA CK804 chipset and a SST SST49LF080A (BIOS?) chip 33-4C-NHE 0631138-B I have 2 cpus one is a 65nm dual core opteron 2210, the second CPU is a quad core 45nm "Shanghai" opteron 2376 - this CPU isn't supported by the ASUS BIOS. My aim is to get the board to boot with the quad core CPU, I would be happy if it boots with support for all the RAM and at least one of the ethernet ports, I can live without PCI, SATA, USB etc.
Do you know if the board can physically support the CPU?
Do you want to run the board with a mixed setup of both CPUs? (i have no idea if that works, but i would bet on: NO)
A. does FAM10 refer to 45nm Opterons (eg 2376)
and to 65nm quad cores
C. It seems that support for the Shanghai processors is not complete but will probably be in the near future, is this correct?
Possible, it is unclear if the microcode patches are needed, but with them it does not work.
Is your shanghai a CPU from a store? (= no engineering sample)
D. I am slightly confused by North and South bridge, is the CK804 a southbridge?, do the opteron CPUs provide their own northbridge?
The terminology is that the northbridge is in the CPU and the chipset is the I/O hub.
Christian
Thanks Christian
My questions are about running coreboot on an ASUS KFN4-D16 with an NVIDIA CK804 chipset and a SST SST49LF080A (BIOS?) chip 33-4C-NHE 0631138-B I .... 45nm "Shanghai" opteron 2376 - this CPU isn't supported by the ASUS BIOS.
Do you know if the board can physically support the CPU?
According to ASUS "no", but I am guessing that this is because they haven't updated the BIOS I thought that all socket F boards should support shanghai opterons?
Do you want to run the board with a mixed setup of both CPUs?
not really, well I would if was possible, I just assumed that it wasn't.
A. does FAM10 refer to 45nm Opterons (eg 2376)
and to 65nm quad cores
OK, thanks
Is your shanghai a CPU from a store? (= no engineering sample)
yes, from shop in UK
The terminology is that the northbridge is in the CPU
thanks, I think I understand, so older style boards had separate northbridge chips but the socket f boards + opteron have northbridge in the CPU?
Tom
On Tue, Jun 16, 2009 at 3:36 AM, Thomas Ward tomwardathome@yahoo.co.ukwrote:
Hello, I realise your list is very technical and this is probably a daft set of question so please forgive me if it is cluttering up your list but I couldn't find a more appropriate place to ask,.. ... so here goes.
My questions are about running coreboot on an ASUS KFN4-D16 with an NVIDIA CK804 chipset and a SST SST49LF080A (BIOS?) chip 33-4C-NHE 0631138-B I have 2 cpus one is a 65nm dual core opteron 2210, the second CPU is a quad core 45nm "Shanghai" opteron 2376 - this CPU isn't supported by the ASUS BIOS. My aim is to get the board to boot with the quad core CPU, I would be happy if it boots with support for all the RAM and at least one of the ethernet ports, I can live without PCI, SATA, USB etc.
They should all work.
My plan is to use a BIOS saviour and buy a second SST49LF080A chip and then
- flash coreboot for K8 (?) with flashrom to verify that coreboot works
on this board 2. flash coreboot for fam10 (?) with flashrom
does this sound like a good plan?
Yes.
I have a couple of other questions E. what sort of BIOS chip / bios saviour kit should I use with this board?
RD1
output from flashrom, superiotool and lspci appended below,
many thanks for any help and good luck with your project,
Tom Ward
root@shed:/home/tom/coreboot/flashrom# ./flashrom flashrom v0.9.0-r555 No coreboot table found. Found chipset "NVIDIA CK804", enabling flash write... OK. Calibrating delay loop... OK. Found chip "SST SST49LF080A" (1024 KB) at physical address 0xfff00000.
Hopefully this chip is socketed.
superiotool r3695 Found Winbond W83627THF/THG (id=0x82, rev=0x84) at 0x2e
This SuperIO is supported.
00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a4)
This board is very similar to the tyan/s2892.
If I were you, my first step would be to get the BIOS savior (or just use the pushpin method)
http://www.coreboot.org/Developer_Manual
Once you can recover from a bad flash,
check out the latest coreboot-v2 mkdir src/mainboard/asus/kfn4-d16 svn cp src/mainboard/tyan/s2892/* src/mainboard/asus/kfn4-d16/ mkdir targets/asus/kfn4-d16 svn cp targets/tyan/s2892/Config.lb targets/asus/kfn4-d16/
edit src/mainboard/asus/kfn4-d16/Config.lb
Enable devices that are in your lspci, disable any that don't show up. Don't worry about cards that you plug in, they'll be found automatically. Change the SuperIO from chip superio/winbond/w83627hf to superio/winbond/w83627thf and change any settings there that you need to. Change socket_940 to socket_F
(When you're ready to switch to fam10) Change amdk8 to amdfam10 everywhere (may need some other small fixups)
edit targets/asus/kfn4-d16/Config.lb Change s2892 to kfn4-d16 make sure ROM_SIZE matches the chip you're using.
cd targets ./buildtarget asus/kfn4-d16
make a payload (Maybe seabios) cp your_payload targets/asus/kfn4-d16/kfn4-d16/payload.elf
make -C asus/kfn4-d16/kfn4-d16
Last step is to send your patches to the list with a Signed-off-by: <your-email> line.
Thanks, Myles
Myles Watson wrote:
E. what sort of BIOS chip / bios saviour kit should I use with this board?
There are many compatible flash chips. SST49LF080A should be fairly easy to get, maybe also Winbond W39V080A.
RD1
Make that RD1-LPC8.
(or just use the pushpin method)
If you do, please be very careful not to get cyanoacrylate super glue on your skin, I am still having some issues from glueing I did very long ago where I accidentally got some on me. Please use gloves. I wasn't thinking. :(
//Peter
Ward Vandewege schrieb:
Hi Marc,
On Mon, Jun 15, 2009 at 04:14:32PM -0600, Marc Jones wrote:
Very good catch. I am suprised we didn't see this sooner.
The bug is that the code was always updating the passed value to the next link offset even when it was on the requested link (cap_count). I think that your patch has a slight problem in that it skips the CapID and CapType check on the first link, link_no == 0.
Attached is a slightly different fix (untested). I think that this function could be rewritten to be more clear but this is what you get when tying to keep code similarity when going from asm to C.....
Please review and test.
This does not appear to fix the hang for my board. Here's a boot log with lots of register dumping enabled:
http://ward.vandewege.net/coreboot/h8dmr/fam10/h8dmr-ae.cap
And here's one without the very lengthy dumps:
http://ward.vandewege.net/coreboot/h8dmr/fam10/h8dmr-af.cap
I see very little difference with the older dumps.
Thanks, Ward.
Hi Ward,
its hard to tell by the logs. I am not familiar with the board topology. However, if I read the output correctly, the code seems to perform alright on the first but not on the second CPU. I went through our code patches and discovered that there may be an additional fix you might need to incorporate in order to get it working. The AMD_checkLinkType procedure only checks for gangend/unganged, HT1 vs. HT3 and so forth, but omitts a check as to whether the link was initialized correctly (i.e.present device, no CRC errors on the link, the like). We added a procedure checking bit no. 4 and 5 of the link control register whether the link was initialized correctly and didnt suffer a link failure. The procedure is called just before the HtSetPhyRegister function is executed. I attached the procedure to make it clear - not a diff file because this should normally be contained somewhere in the checkLinkType function (up until now, it was just a quick hack sort of). Check if this reports your link1 on cpu1 unconnnected. It should solve your problem then. Good luck,
Maximilian
/** * AMD_checkLinkInitialized: check whether HT links are initialized correctly */ BOOL AMD_checkLinkInitialized(u8 node, u8 link, u8 regoff) { u32 val;
val = pci_read_config32(NODE_PCI(node, 0), regoff + 0x18);
val &= 0x02; // link init complete
if (val == 0) { //printk_debug("Link %d on node %d NOT active\n", link, node); return FALSE; } else{ //printk_debug("Link %d on node %d active\n", link, node); val = pci_read_config32(NODE_PCI(node, 0), regoff + 0x04); // ht link control reg val = ((val & 0x30) >> 4); // bit 4 and 5 if(val = 0x10){ // init complete and no link failure //printk_debug("Link %d on node %d has completed init has no link failure\n", link, node); return TRUE; } else { //printk_debug("Link %d on node %d failed to initialize\n", link, node); return FALSE; } } }
Hi Maximilian and Marc,
On Tue, Jun 16, 2009 at 12:07:10PM +0200, Maximilian Thuermer wrote:
its hard to tell by the logs. I am not familiar with the board topology. However, if I read the output correctly, the code seems to perform alright on the first but not on the second CPU. I went through our code patches and discovered that there may be an additional fix you might need to incorporate in order to get it working. The AMD_checkLinkType procedure only checks for gangend/unganged, HT1 vs. HT3 and so forth, but omitts a check as to whether the link was initialized correctly (i.e.present device, no CRC errors on the link, the like). We added a procedure checking bit no. 4 and 5 of the link control register whether the link was initialized correctly and didnt suffer a link failure. The procedure is called just before the HtSetPhyRegister function is executed. I attached the procedure to make it clear - not a diff file because this should normally be contained somewhere in the checkLinkType function (up until now, it was just a quick hack sort of). Check if this reports your link1 on cpu1 unconnnected. It should solve your problem then. Good luck,
I think this helps. Have a look at how I modified init_cpus.c:
http://ward.vandewege.net/coreboot/h8dmr/fam10/init_cpus-ak.c
Is that what you intended? Here's a boot log
http://ward.vandewege.net/coreboot/h8dmr/fam10/h8dmr-ak.cap
As you can see, it gets past the second CPU initialization, which is great. However, it soft resets itself a little further - but that's after (at least) two cores start to talk at the same time, so perhaps that's a different problem? Or maybe I didn't implement that call to AMD_checkLinkInitialized correctly?
I also tested with Marc's patch and your second one on top. Here's init_cpus.c:
http://ward.vandewege.net/coreboot/h8dmr/fam10/init_cpus-al.c
And a boot log:
http://ward.vandewege.net/coreboot/h8dmr/fam10/h8dmr-al.cap
The behaviour is different - here, booting hangs after it prints 'Start node 01 done.'
Any further thoughts?
Thanks, Ward.
Hi Ward,
... I think this helps. Have a look at how I modified init_cpus.c:
http://ward.vandewege.net/coreboot/h8dmr/fam10/init_cpus-ak.c
Is that what you intended? Here's a boot log
http://ward.vandewege.net/coreboot/h8dmr/fam10/h8dmr-ak.cap
As you can see, it gets past the second CPU initialization, which is great. However, it soft resets itself a little further - but that's after (at least) two cores start to talk at the same time, so perhaps that's a different problem? Or maybe I didn't implement that call to AMD_checkLinkInitialized correctly?
Yes, this is exactly the behaviour it is supposed to show. Also, you put the function call in the right place. An observation we made with our C2 Opterons is that the microcode patch led the system to resets of unknown origin. As a matter of fact, our machine kept on resetting at the exact same spot as yours. Try uncommenting the entry in the microcode lookup table and see how far your code gets. The rather akward output of what seems to be multiple cores writing to the serial console at the same time has been discussed in other threads afaik and remains an issue to be solved. Up until know, it did not keep our systems from booting and only seems to be a temporary effect.
Best regards,
Maximilian
Hi Maximilian,
On Tue, Jun 16, 2009 at 06:11:55PM +0200, Maximilian Thuermer wrote:
Yes, this is exactly the behaviour it is supposed to show. Also, you put the function call in the right place. An observation we made with our C2 Opterons is that the microcode patch led the system to resets of unknown origin. As a matter of fact, our machine kept on resetting at the exact same spot as yours. Try uncommenting the entry in the microcode lookup table and see how far your code gets. The rather akward output of what seems to be multiple cores writing to the serial console at the same time has been discussed in other threads afaik and remains an issue to be solved. Up until know, it did not keep our systems from booting and only seems to be a temporary effect.
Yep, confirmed. With Marc's latest patch and the microcode disabled I get all the way to the payload (of course there is the multi-core talking problem, as well as the relatively slow memory init) where it hangs:
http://ward.vandewege.net/coreboot/h8dmr/fam10/h8dmr-ao.cap
Hmm. I guess I'll need to dig a bit deeper there.
Thanks! Ward.
On Tue, Jun 16, 2009 at 4:07 AM, Maximilian ThuermerMaximilian.Thuermer@stud.uni-karlsruhe.de wrote:
its hard to tell by the logs. I am not familiar with the board topology. However, if I read the output correctly, the code seems to perform alright on the first but not on the second CPU. I went through our code patches and discovered that there may be an additional fix you might need to incorporate in order to get it working. The AMD_checkLinkType procedure only checks for gangend/unganged, HT1 vs. HT3 and so forth, but omitts a check as to whether the link was initialized correctly (i.e.present device, no CRC errors on the link, the like). We added a procedure checking bit no. 4 and 5 of the link control register whether the link was initialized correctly and didnt suffer a link failure. The procedure is called just before the HtSetPhyRegister function is executed. I attached the procedure to make it clear - not a diff file because this should normally be contained somewhere in the checkLinkType function (up until now, it was just a quick hack sort of). Check if this reports your link1 on cpu1 unconnnected. It should solve your problem then. Good luck,
Maximilian,
You found another bug. This one in checkLinkType. It checks the connect, init complete, and type but continues on regardless if the link is initialized. The caller expects the return value to be 0 is there was any problems with the link. Yeah, that is bad...
I don't think that you need the CRC errror checking at this point (but it wouldn't hurt). I think that should be handled in the HT init code and only the init
Here is an updated patch (untested as usual).
Signed-off-by: Marc Jones marcj303@yahoo.com
Marc
On Tue, Jun 16, 2009 at 11:16:16AM -0600, Marc Jones wrote:
On Tue, Jun 16, 2009 at 4:07 AM, Maximilian ThuermerMaximilian.Thuermer@stud.uni-karlsruhe.de wrote:
its hard to tell by the logs. I am not familiar with the board topology. However, if I read the output correctly, the code seems to perform alright on the first but not on the second CPU. I went through our code patches and discovered that there may be an additional fix you might need to incorporate in order to get it working. The AMD_checkLinkType procedure only checks for gangend/unganged, HT1 vs. HT3 and so forth, but omitts a check as to whether the link was initialized correctly (i.e.present device, no CRC errors on the link, the like). We added a procedure checking bit no. 4 and 5 of the link control register whether the link was initialized correctly and didnt suffer a link failure. The procedure is called just before the HtSetPhyRegister function is executed. I attached the procedure to make it clear - not a diff file because this should normally be contained somewhere in the checkLinkType function (up until now, it was just a quick hack sort of). Check if this reports your link1 on cpu1 unconnnected. It should solve your problem then. Good luck,
Maximilian,
You found another bug. This one in checkLinkType. It checks the connect, init complete, and type but continues on regardless if the link is initialized. The caller expects the return value to be 0 is there was any problems with the link. Yeah, that is bad...
I don't think that you need the CRC errror checking at this point (but it wouldn't hurt). I think that should be handled in the HT init code and only the init
Here is an updated patch (untested as usual).
Here's a boot log:
http://ward.vandewege.net/coreboot/h8dmr/fam10/h8dmr-am.cap
This patch now exhibits the same behaviour as Maximilian's two patches: no more hang at initialization of CPU1, but a soft reset a little futher down. I guess we're not quite there yet, but this is definitely a step in the right direction.
Signed-off-by: Marc Jones marcj303@yahoo.com
Acked-by: Ward Vandewege ward@gnu.org
Thanks, Ward.
On Tue, Jun 16, 2009 at 12:48 PM, Ward Vandewege ward@gnu.org wrote:
On Tue, Jun 16, 2009 at 11:16:16AM -0600, Marc Jones wrote:
On Tue, Jun 16, 2009 at 4:07 AM, Maximilian ThuermerMaximilian.Thuermer@stud.uni-karlsruhe.de wrote:
its hard to tell by the logs. I am not familiar with the board
topology.
However, if I read the output correctly, the code seems to perform alright on the first
but not
on the second CPU. I went through our code patches and discovered that there may be
an
additional fix you might need to incorporate in order to get it working. The AMD_checkLinkType procedure only checks for gangend/unganged, HT1
vs.
HT3 and so forth, but omitts a check as to whether the link was initialized correctly (i.e.present device, no CRC errors on the link, the like). We added a procedure checking bit no. 4 and 5 of the link control
register
whether the link was initialized correctly and didnt suffer a link failure. The procedure is called just before the HtSetPhyRegister function is executed. I attached the procedure to make it clear - not a
diff
file because this should normally be contained somewhere in the checkLinkType function (up until now, it
was
just a quick hack sort of). Check if this reports your link1 on cpu1 unconnnected. It should solve
your
problem then. Good luck,
Maximilian,
You found another bug. This one in checkLinkType. It checks the connect, init complete, and type but continues on regardless if the link is initialized. The caller expects the return value to be 0 is there was any problems with the link. Yeah, that is bad...
I don't think that you need the CRC errror checking at this point (but it wouldn't hurt). I think that should be handled in the HT init code and only the init
Here is an updated patch (untested as usual).
Here's a boot log:
http://ward.vandewege.net/coreboot/h8dmr/fam10/h8dmr-am.cap
This patch now exhibits the same behaviour as Maximilian's two patches: no more hang at initialization of CPU1, but a soft reset a little futher down. I guess we're not quite there yet, but this is definitely a step in the right direction.
Signed-off-by: Marc Jones marcj303@yahoo.com
Acked-by: Ward Vandewege ward@gnu.org
Thanks, Ward.
Did you test with defaults.h errata patch as well? Can you ack it too?
r4358
Marc
On Tue, Jun 16, 2009 at 05:03:10PM -0600, Marc Jones wrote:
Did you test with defaults.h errata patch as well? Can you ack it too?
Yes, sorry - that was on top of your latest errata patch. I just acked it.
Thanks! Ward.
Marc Jones schrieb:
Hi Maximilian,
Very good catch. I am suprised we didn't see this sooner.
The bug is that the code was always updating the passed value to the next link offset even when it was on the requested link (cap_count). I think that your patch has a slight problem in that it skips the CapID and CapType check on the first link, link_no == 0.
Attached is a slightly different fix (untested). I think that this function could be rewritten to be more clear but this is what you get when tying to keep code similarity when going from asm to C.....
Please review and test.
Signed-off-by: Marc Jones marcj303@gmail.com
Thanks, Marc
Hi Marc,
this patch looks like it should do the trick. Can't test it until next Monday though, but I will let you know how it turned out...
Thanks,
Maximilian
Hi Maximilian,
On Mon, Jun 15, 2009 at 08:25:47PM +0200, Maximilian Thuermer wrote:
while working on porting an HP DL 145G3 to K10 support with rev C2 Opterons, our team stumbled across the same problem Ward described in his thread. I tried out all errata fixes and the latest microcode patch files - no success. We tracked down the problem to be in the CpuFindCapability function which returns
... snip ...
Opteron links properly connected and initialized, the only effect one would see is a missing switch to HT3 frequencies on link0.
Impressive! But, oddly it does not seem to fix the problem for my board (supermicro h8dmr). Did this really fix the problem for you? Here's a boot log:
http://ward.vandewege.net/coreboot/h8dmr/fam10/h8dmr-ag.cap
Thanks, Ward.