Looking at your change 41369 - soc/amd/stoneyridge: add resources during read_resources() - I tried to do a similar style change on top of your 3 fixes above, and surprisingly it worked at first try - now I'm able to see the boot devices and floppies. New bootlog is attached. After testing it more (should be able to boot 100% of times) I'm going to submit it to review coreboot org soon, for your review - and also we will need to do a similar change for family14 and family16kb if this one succeeds.
diff --git a/src/northbridge/amd/agesa/family15tn/northbridge.c b/src/northbridge/amd/agesa/family15tn/northbridge.c index 9d41e7a1f1..0194ea82ea 100644 --- a/src/northbridge/amd/agesa/family15tn/northbridge.c +++ b/src/northbridge/amd/agesa/family15tn/northbridge.c @@ -666,6 +666,8 @@ static void domain_set_resources(struct device *dev) u32 reset_memhole = 1; #endif
+ domain_read_resources(dev); + pci_tolm = 0xffffffffUL; for (link = dev->link_list; link; link = link->next) { pci_tolm = find_pci_tolm(link); @@ -749,17 +751,18 @@ static void domain_set_resources(struct device *dev) }
add_uma_resource_below_tolm(dev, 7); - +/* for (link = dev->link_list; link; link = link->next) { if (link->children) { assign_resources(link); } } +*/ }
static struct device_operations pci_domain_ops = { - .read_resources = domain_read_resources, - .set_resources = domain_set_resources, + .read_resources = domain_set_resources, + .set_resources = pci_domain_set_resources, .scan_bus = pci_domain_scan_bus, };
On Fri, May 15, 2020 at 12:10 PM Mike Banon mikebdp2@gmail.com wrote:
Although it's still the same result even with three changes (either can't boot or no boot devices, randomly) - there is a positive effect that USB FT232H log now 't stop and I'm finally able to share a full log for a boot problem. Please compare these two logs:
- ok_e6fb1344ed9188e19be4b54bdf1a76680b8c4523.txt - boot log for last
"working" commit (before the allocator changes) 2) 3fixes.txt - boot log with 3 changes applied on top of 6b95507ec5b087658178a325bdc68570bc48bb20 (after the allocator changes) Hope this comparison will give enough clues about how to fix it further - and I'll happily test your new changes aimed on fixing this
Best regards, Mike Banon
On Fri, May 15, 2020 at 2:44 AM Furquan Shaikh furquan.m.shaikh@gmail.com wrote:
I have uploaded 2 changes on top of Aaron's change. Can you please give these three changes a try: https://review.coreboot.org/c/coreboot/+/41363 https://review.coreboot.org/c/coreboot/+/41418 https://review.coreboot.org/c/coreboot/+/41419
Thank you!
- Furquan
On Thu, May 14, 2020 at 4:16 PM Aaron Durbin adurbin@google.com wrote:
On Thu, May 14, 2020 at 3:46 PM Aaron Durbin adurbin@google.com wrote:
On Thu, May 14, 2020 at 2:46 PM Mike Banon mikebdp2@gmail.com wrote:
Unfortunately it seems a lot of boards are affected by this. A88XM-E and Lenovo G505S (AMD fam15h) also got broken: they rarely succeed at booting - and, when they do, no boot devices are available (virtual floppies too, for some reason) - except coreinfo/tint secondary payloads which became prone to freezing. I attach the A88XM-E logs I've been able to obtain with USB FT232H:
- ok_e6fb1344ed9188e19be4b54bdf1a76680b8c4523.txt - last coreboot
repo's revision where all the stuff works 2) fail_1_3b02006afe8a85477dafa1bd149f1f0dba02afc7.txt - this commit got the boards broken for the first time 3) fail_2_6b95507ec5b087658178a325bdc68570bc48bb20.txt - this is a log for coreboot's master top
For some reason logs for 2) and 3) always stop after "PCI: 00:12.2 EHCI Debug Port hook triggered".
I hope these commits could be reverted before we figure out what's going on with them. Good thing we've noticed it fast enough.
Thanks, Mike. The amd chipset code (all of it from what I can tell) is fundamentally broken and at odds with all of the resource allocation flow. They worked previously because dynamic resources were being assigned using an algorithm that just assumed there weren't collisions, and that was done w/o all the necessary info required for making the proper decisions regarding dynamic resource allocation.
I landed the other chipsets' fixes, but the amd chipset code is going to take a lot more to fix. Would you be willing to test patches as they are crafted? Given the largeness of the problem as well as the gnarly code that is the amd chipset code it's going to take some time so I think we do need to revert the allocator changes until we can do some house keeping.
I just was brainstorming with Furquan. He did push the revert changes, but we were scheming on a patch that I was hoping affected parties could try in conjunction with https://review.coreboot.org/c/coreboot/+/41363. Basically we'll allocate top down like the previous allocator did hoping for no collisions. Let's try that, and see where we land. Regardless we need to fix this amd chipset code as it's a major liability.
-Aaron
Best regards, Mike Banon
On Thu, May 14, 2020 at 8:47 PM Keith Hui buurin@gmail.com wrote:
Hi guys,
31ab7de51a is CB:41368, cherry picked into my local repo.
Turns out I have to back out all four of Furquan's patches (CB:39486~39489) for my board to boot normally again.
Thoughts?
I'll now get a log with everything in at SPEW.
On Thu, May 14, 2020 at 1:05 PM Aaron Durbin adurbin@google.com wrote: > > Keith, is it possible to have the console log level set to SPEW? I'm not seeing the full logs to piece it all together. > > Allocating resources... > Reading resources... > Setting RAM size to 768 MB > PNP: 03f0.8 missing read_resources > Done reading resources. > Resource allocator: DOMAIN: 0000 - Pass 1 (gathering requirements) > Resource allocator: DOMAIN: 0000 - Pass 2 (allocating resources) > Resource ranges: > Base: 1000, Size: d000, Tag: 100 > Base: f000, Size: 1000, Tag: 100 > Resource ranges: > Base: 0, Size: ff800000, Tag: 200 > Base: 100000000, Size: f00000000, Tag: 100200 > Resource ranges: > Base: 10000000, Size: 8000000, Tag: 1200 > Resource ranges: > Base: 18000000, Size: 1100000, Tag: 200 > > This is the memory address space: > Base: 0, Size: ff800000, Tag: 200 > Base: 100000000, Size: f00000000, Tag: 100200 > > Those are valid ranges to choose dynamic resources from. > > PCI: 00:00.0 10 <- [0x0000000000 - 0x000fffffff] size 0x10000000 gran 0x1c prefmem > > I see 'Setting RAM size to 768 MB' which means I would expect to see a hole in the ranges representing 768MiB. > > that would be bad. I don't know what commit '31ab7de51a' is, but it might not contain the CB:41368. Having SPEW logs would be helpful. > > Also, what mainboard Kconfig are you selecting for p3bf? src/mainboard/asus/p2b ? > > > > On Thu, May 14, 2020 at 10:42 AM Keith Hui buurin@gmail.com wrote: >> >> (Temporarily leaving the list out) >> >> Hi Aaron, >> >> Here is a log with everything including CB:41368 included. I'll get >> this log out to you first, while I try a build with all problem >> commits left out. >> >> Thanks >> Keith >> >> On Thu, May 14, 2020 at 12:53 AM Aaron Durbin adurbin@google.com wrote: >> > >> > >> > >> > On Wed, May 13, 2020 at 10:51 PM Keith Hui buurin@gmail.com wrote: >> >> >> >> Hi guys, >> >> >> >> I tested these fixes on my board, and I have to say there's still >> >> something wrong. They did address the hang or reset in SeaBIOS I first >> >> described, but now either my ATA hard drive failed to boot (it tried >> >> to hand off to GRUB on my drive, but didn't get there), or it can't >> >> find the option ROM of my video card, meaning no display. >> >> >> >> Now I want to try the other way, testing a build with all changes >> >> related to the problem backed out instead. So besides the one I first >> >> identified, what other related patches should I try backing out? >> > >> > >> > Just go to the parent of the identified patch. As for the other symptoms you are seeing, I'd love to see logs with the patches we identified so we can root cause. >> > >> > Thanks. >> > >> > -Aaron >> > >> >> >> >> On Wed, May 13, 2020 at 11:54 PM Furquan Shaikh >> >> furquan.m.shaikh@gmail.com wrote: >> >> > >> >> > Similar fix for i440x: https://review.coreboot.org/c/coreboot/+/41368 >> >> > >> >> > On Wed, May 13, 2020 at 11:29 AM Aaron Durbin adurbin@google.com wrote: >> >> > > >> >> > > i440x chipset is doing things in the wrong way like sandybridge. I uploaded this fix for sandy: https://review.coreboot.org/c/coreboot/+/41364 We'll need to do the equivalent for i440x. >> >> > > >> >> > > On Wed, May 13, 2020 at 11:13 AM Aaron Durbin adurbin@google.com wrote: >> >> > >> >> >> > >> OK. I'll take a look at your logs and see what's going on. The patch link I sent was based off of someone else's mainboard logs. >> >> > >> >> >> > >> On Wed, May 13, 2020 at 10:59 AM Keith Hui buurin@gmail.com wrote: >> >> > >>> >> >> > >>> Hi Aaron, >> >> > >>> >> >> > >>> It didn't help. There still a way out of whack entry in the coreboot >> >> > >>> table and e820 entry ending at 000003ffffffffff, which I think have >> >> > >>> more to do than the 41363's scope. >> >> > >>> >> >> > >>> Keith >> >> > >>> >> >> > >>> On Wed, May 13, 2020 at 12:24 PM Aaron Durbin adurbin@google.com wrote: >> >> > >>> > >> >> > >>> > I think the following patch will fix things up: https://review.coreboot.org/c/coreboot/+/41363 Please let me know. >> >> > >>> > >> >> > >>> > On Wed, May 13, 2020 at 8:43 AM Keith Hui buurin@gmail.com wrote: >> >> > >>> >> >> >> > >>> >> Thanks Furquan. >> >> > >>> >> >> >> > >>> >> Here are 3 logs. Log 1 is at the commit just before the problem. Log 2 >> >> > >>> >> is at the problem commit. Log 3 is at the current master, if that's >> >> > >>> >> what you meant by ToT. >> >> > >>> >> >> >> > >>> >> I'm using SeaBIOS 1.13.0, compiled once using the attached .config >> >> > >>> >> before taking these logs. All 3 runs are taken using the same SeaBIOS >> >> > >>> >> binary. >> >> > >>> >> >> >> > >>> >> Then I recompiled SeaBIOS with CONFIG_RELOCATE_INIT off, replaced the >> >> > >>> >> payload used in run 3, and took an extra run. In this case the board >> >> > >>> >> reset on its own at "Scanning option roms", looping infinitely. >> >> > >>> >> >> >> > >>> >> Hope this helps >> >> > >>> >> Keith >> >> > >>> >> >> >> > >>> >> On Wed, May 13, 2020 at 7:38 AM Furquan Shaikh >> >> > >>> >> furquan.m.shaikh@gmail.com wrote: >> >> > >>> >> > >> >> > >>> >> > Thanks for the report Keith! >> >> > >>> >> > >> >> > >>> >> > On Wed, May 13, 2020 at 3:42 AM Paul Menzel pmenzel@molgen.mpg.de wrote: >> >> > >>> >> > > >> >> > >>> >> > > Dear Keith, >> >> > >>> >> > > >> >> > >>> >> > > >> >> > >>> >> > > Am 13.05.20 um 05:21 schrieb Keith Hui: >> >> > >>> >> > > >> >> > >>> >> > > > I am still refining the P2B family of boards, now including the >> >> > >>> >> > > > infamous P3B-F with an unusual appetite for hacks to make work. >> >> > >>> >> > > > >> >> > >>> >> > > > That said, I'm now finding that, on P3B-F, SeaBIOS hangs when it tries >> >> > >>> >> > > > to relocate itself as part of its usual chores. Having just learned >> >> > >>> >> > > > git bisect, I decided to try it out. >> >> > >>> >> > > > >> >> > >>> >> > > > It was commit 3b02006afe8a85477dafa1bd149f1f0dba02afc7 [1] that broke >> >> > >>> >> > > > my SeaBIOS. It doesn't affect my newer toy the P8Z77-M as much as >> >> > >>> >> > > > P3B-F, but I still want to blame that, and probably the very next >> >> > >>> >> > > > commit as well, as they both deal with some very modern aspects of PCI >> >> > >>> >> > > > that well predates the 440BX. >> >> > >>> >> > > > >> >> > >>> >> > > > Is there anything we can do to fix 3b02006afe? >> >> > >>> >> > > >> >> > >>> >> > > I commented in the change-set [1] to make the author and reviewers aware >> >> > >>> >> > > of this issue and referenced your list message, and ask to comment here. >> >> > >>> >> > > >> >> > >>> >> > > Could you please provide the debug log of coreboot and SeaBIOS? >> >> > >>> >> > >> >> > >>> >> > As Paul mentioned, can you please provide the debug logs for coreboot >> >> > >>> >> > and SeaBIOS both with ToT coreboot and with HEAD set before the change >> >> > >>> >> > 3b02006afe where it does not hang? Thanks! >> >> > >>> >> > >> >> > >>> >> > > >> >> > >>> >> > > >> >> > >>> >> > > > Meanwhile I ported the P3B-F board enable to flashrom [2], which got a >> >> > >>> >> > > > heavy workout during this bisect, through vendor firmware and both >> >> > >>> >> > > > good and bad builds of coreboot. In all cases I can flash internal, no >> >> > >>> >> > > > longer having to haul out my P2B-LS just to use it as a flasher. >> >> > >>> >> > > > >> >> > >>> >> > > > Enjoy this long overdue board enable. If it gets submitted, I'll >> >> > >>> >> > > > retract the ramstage hack[3] doing the same as redundant. >> >> > >>> >> > > >> >> > >>> >> > > Very nice! It’s always amazing, how after so many years, when the vendor >> >> > >>> >> > > already stopped supporting the device, the community still supports the >> >> > >>> >> > > device and improves the firmware showing that Free Software is the more >> >> > >>> >> > > sustainable way. >> >> > >>> >> > > >> >> > >>> >> > > >> >> > >>> >> > > Kind regards, >> >> > >>> >> > > >> >> > >>> >> > > Paul >> >> > >>> >> > > >> >> > >>> >> > > >> >> > >>> >> > > > [1] https://review.coreboot.org/c/coreboot/+/39486 >> >> > >>> >> > > > [2] https://review.coreboot.org/c/flashrom/+/41354 >> >> > >>> >> > > > [3] https://review.coreboot.org/c/coreboot/+/41224 >> >> > >>> >> > > _______________________________________________ >> >> > >>> >> > > coreboot mailing list -- coreboot@coreboot.org >> >> > >>> >> > > To unsubscribe send an email to coreboot-leave@coreboot.org >> >> > >>> >> _______________________________________________ >> >> > >>> >> coreboot mailing list -- coreboot@coreboot.org >> >> > >>> >> To unsubscribe send an email to coreboot-leave@coreboot.org _______________________________________________ coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org