Hi guys,
31ab7de51a is CB:41368, cherry picked into my local repo.
Turns out I have to back out all four of Furquan's patches (CB:39486~39489) for my board to boot normally again.
Thoughts?
I'll now get a log with everything in at SPEW.
On Thu, May 14, 2020 at 1:05 PM Aaron Durbin adurbin@google.com wrote:
Keith, is it possible to have the console log level set to SPEW? I'm not seeing the full logs to piece it all together.
Allocating resources... Reading resources... Setting RAM size to 768 MB PNP: 03f0.8 missing read_resources Done reading resources. Resource allocator: DOMAIN: 0000 - Pass 1 (gathering requirements) Resource allocator: DOMAIN: 0000 - Pass 2 (allocating resources) Resource ranges: Base: 1000, Size: d000, Tag: 100 Base: f000, Size: 1000, Tag: 100 Resource ranges: Base: 0, Size: ff800000, Tag: 200 Base: 100000000, Size: f00000000, Tag: 100200 Resource ranges: Base: 10000000, Size: 8000000, Tag: 1200 Resource ranges: Base: 18000000, Size: 1100000, Tag: 200
This is the memory address space: Base: 0, Size: ff800000, Tag: 200 Base: 100000000, Size: f00000000, Tag: 100200
Those are valid ranges to choose dynamic resources from.
PCI: 00:00.0 10 <- [0x0000000000 - 0x000fffffff] size 0x10000000 gran 0x1c prefmem
I see 'Setting RAM size to 768 MB' which means I would expect to see a hole in the ranges representing 768MiB.
that would be bad. I don't know what commit '31ab7de51a' is, but it might not contain the CB:41368. Having SPEW logs would be helpful.
Also, what mainboard Kconfig are you selecting for p3bf? src/mainboard/asus/p2b ?
On Thu, May 14, 2020 at 10:42 AM Keith Hui buurin@gmail.com wrote:
(Temporarily leaving the list out)
Hi Aaron,
Here is a log with everything including CB:41368 included. I'll get this log out to you first, while I try a build with all problem commits left out.
Thanks Keith
On Thu, May 14, 2020 at 12:53 AM Aaron Durbin adurbin@google.com wrote:
On Wed, May 13, 2020 at 10:51 PM Keith Hui buurin@gmail.com wrote:
Hi guys,
I tested these fixes on my board, and I have to say there's still something wrong. They did address the hang or reset in SeaBIOS I first described, but now either my ATA hard drive failed to boot (it tried to hand off to GRUB on my drive, but didn't get there), or it can't find the option ROM of my video card, meaning no display.
Now I want to try the other way, testing a build with all changes related to the problem backed out instead. So besides the one I first identified, what other related patches should I try backing out?
Just go to the parent of the identified patch. As for the other symptoms you are seeing, I'd love to see logs with the patches we identified so we can root cause.
Thanks.
-Aaron
On Wed, May 13, 2020 at 11:54 PM Furquan Shaikh furquan.m.shaikh@gmail.com wrote:
Similar fix for i440x: https://review.coreboot.org/c/coreboot/+/41368
On Wed, May 13, 2020 at 11:29 AM Aaron Durbin adurbin@google.com wrote:
i440x chipset is doing things in the wrong way like sandybridge. I uploaded this fix for sandy: https://review.coreboot.org/c/coreboot/+/41364 We'll need to do the equivalent for i440x.
On Wed, May 13, 2020 at 11:13 AM Aaron Durbin adurbin@google.com wrote: > > OK. I'll take a look at your logs and see what's going on. The patch link I sent was based off of someone else's mainboard logs. > > On Wed, May 13, 2020 at 10:59 AM Keith Hui buurin@gmail.com wrote: >> >> Hi Aaron, >> >> It didn't help. There still a way out of whack entry in the coreboot >> table and e820 entry ending at 000003ffffffffff, which I think have >> more to do than the 41363's scope. >> >> Keith >> >> On Wed, May 13, 2020 at 12:24 PM Aaron Durbin adurbin@google.com wrote: >> > >> > I think the following patch will fix things up: https://review.coreboot.org/c/coreboot/+/41363 Please let me know. >> > >> > On Wed, May 13, 2020 at 8:43 AM Keith Hui buurin@gmail.com wrote: >> >> >> >> Thanks Furquan. >> >> >> >> Here are 3 logs. Log 1 is at the commit just before the problem. Log 2 >> >> is at the problem commit. Log 3 is at the current master, if that's >> >> what you meant by ToT. >> >> >> >> I'm using SeaBIOS 1.13.0, compiled once using the attached .config >> >> before taking these logs. All 3 runs are taken using the same SeaBIOS >> >> binary. >> >> >> >> Then I recompiled SeaBIOS with CONFIG_RELOCATE_INIT off, replaced the >> >> payload used in run 3, and took an extra run. In this case the board >> >> reset on its own at "Scanning option roms", looping infinitely. >> >> >> >> Hope this helps >> >> Keith >> >> >> >> On Wed, May 13, 2020 at 7:38 AM Furquan Shaikh >> >> furquan.m.shaikh@gmail.com wrote: >> >> > >> >> > Thanks for the report Keith! >> >> > >> >> > On Wed, May 13, 2020 at 3:42 AM Paul Menzel pmenzel@molgen.mpg.de wrote: >> >> > > >> >> > > Dear Keith, >> >> > > >> >> > > >> >> > > Am 13.05.20 um 05:21 schrieb Keith Hui: >> >> > > >> >> > > > I am still refining the P2B family of boards, now including the >> >> > > > infamous P3B-F with an unusual appetite for hacks to make work. >> >> > > > >> >> > > > That said, I'm now finding that, on P3B-F, SeaBIOS hangs when it tries >> >> > > > to relocate itself as part of its usual chores. Having just learned >> >> > > > git bisect, I decided to try it out. >> >> > > > >> >> > > > It was commit 3b02006afe8a85477dafa1bd149f1f0dba02afc7 [1] that broke >> >> > > > my SeaBIOS. It doesn't affect my newer toy the P8Z77-M as much as >> >> > > > P3B-F, but I still want to blame that, and probably the very next >> >> > > > commit as well, as they both deal with some very modern aspects of PCI >> >> > > > that well predates the 440BX. >> >> > > > >> >> > > > Is there anything we can do to fix 3b02006afe? >> >> > > >> >> > > I commented in the change-set [1] to make the author and reviewers aware >> >> > > of this issue and referenced your list message, and ask to comment here. >> >> > > >> >> > > Could you please provide the debug log of coreboot and SeaBIOS? >> >> > >> >> > As Paul mentioned, can you please provide the debug logs for coreboot >> >> > and SeaBIOS both with ToT coreboot and with HEAD set before the change >> >> > 3b02006afe where it does not hang? Thanks! >> >> > >> >> > > >> >> > > >> >> > > > Meanwhile I ported the P3B-F board enable to flashrom [2], which got a >> >> > > > heavy workout during this bisect, through vendor firmware and both >> >> > > > good and bad builds of coreboot. In all cases I can flash internal, no >> >> > > > longer having to haul out my P2B-LS just to use it as a flasher. >> >> > > > >> >> > > > Enjoy this long overdue board enable. If it gets submitted, I'll >> >> > > > retract the ramstage hack[3] doing the same as redundant. >> >> > > >> >> > > Very nice! It’s always amazing, how after so many years, when the vendor >> >> > > already stopped supporting the device, the community still supports the >> >> > > device and improves the firmware showing that Free Software is the more >> >> > > sustainable way. >> >> > > >> >> > > >> >> > > Kind regards, >> >> > > >> >> > > Paul >> >> > > >> >> > > >> >> > > > [1] https://review.coreboot.org/c/coreboot/+/39486 >> >> > > > [2] https://review.coreboot.org/c/flashrom/+/41354 >> >> > > > [3] https://review.coreboot.org/c/coreboot/+/41224 >> >> > > _______________________________________________ >> >> > > coreboot mailing list -- coreboot@coreboot.org >> >> > > To unsubscribe send an email to coreboot-leave@coreboot.org >> >> _______________________________________________ >> >> coreboot mailing list -- coreboot@coreboot.org >> >> To unsubscribe send an email to coreboot-leave@coreboot.org
Unfortunately it seems a lot of boards are affected by this. A88XM-E and Lenovo G505S (AMD fam15h) also got broken: they rarely succeed at booting - and, when they do, no boot devices are available (virtual floppies too, for some reason) - except coreinfo/tint secondary payloads which became prone to freezing. I attach the A88XM-E logs I've been able to obtain with USB FT232H:
1) ok_e6fb1344ed9188e19be4b54bdf1a76680b8c4523.txt - last coreboot repo's revision where all the stuff works 2) fail_1_3b02006afe8a85477dafa1bd149f1f0dba02afc7.txt - this commit got the boards broken for the first time 3) fail_2_6b95507ec5b087658178a325bdc68570bc48bb20.txt - this is a log for coreboot's master top
For some reason logs for 2) and 3) always stop after "PCI: 00:12.2 EHCI Debug Port hook triggered".
I hope these commits could be reverted before we figure out what's going on with them. Good thing we've noticed it fast enough.
Best regards, Mike Banon
On Thu, May 14, 2020 at 8:47 PM Keith Hui buurin@gmail.com wrote:
Hi guys,
31ab7de51a is CB:41368, cherry picked into my local repo.
Turns out I have to back out all four of Furquan's patches (CB:39486~39489) for my board to boot normally again.
Thoughts?
I'll now get a log with everything in at SPEW.
On Thu, May 14, 2020 at 1:05 PM Aaron Durbin adurbin@google.com wrote:
Keith, is it possible to have the console log level set to SPEW? I'm not seeing the full logs to piece it all together.
Allocating resources... Reading resources... Setting RAM size to 768 MB PNP: 03f0.8 missing read_resources Done reading resources. Resource allocator: DOMAIN: 0000 - Pass 1 (gathering requirements) Resource allocator: DOMAIN: 0000 - Pass 2 (allocating resources) Resource ranges: Base: 1000, Size: d000, Tag: 100 Base: f000, Size: 1000, Tag: 100 Resource ranges: Base: 0, Size: ff800000, Tag: 200 Base: 100000000, Size: f00000000, Tag: 100200 Resource ranges: Base: 10000000, Size: 8000000, Tag: 1200 Resource ranges: Base: 18000000, Size: 1100000, Tag: 200
This is the memory address space: Base: 0, Size: ff800000, Tag: 200 Base: 100000000, Size: f00000000, Tag: 100200
Those are valid ranges to choose dynamic resources from.
PCI: 00:00.0 10 <- [0x0000000000 - 0x000fffffff] size 0x10000000 gran 0x1c prefmem
I see 'Setting RAM size to 768 MB' which means I would expect to see a hole in the ranges representing 768MiB.
that would be bad. I don't know what commit '31ab7de51a' is, but it might not contain the CB:41368. Having SPEW logs would be helpful.
Also, what mainboard Kconfig are you selecting for p3bf? src/mainboard/asus/p2b ?
On Thu, May 14, 2020 at 10:42 AM Keith Hui buurin@gmail.com wrote:
(Temporarily leaving the list out)
Hi Aaron,
Here is a log with everything including CB:41368 included. I'll get this log out to you first, while I try a build with all problem commits left out.
Thanks Keith
On Thu, May 14, 2020 at 12:53 AM Aaron Durbin adurbin@google.com wrote:
On Wed, May 13, 2020 at 10:51 PM Keith Hui buurin@gmail.com wrote:
Hi guys,
I tested these fixes on my board, and I have to say there's still something wrong. They did address the hang or reset in SeaBIOS I first described, but now either my ATA hard drive failed to boot (it tried to hand off to GRUB on my drive, but didn't get there), or it can't find the option ROM of my video card, meaning no display.
Now I want to try the other way, testing a build with all changes related to the problem backed out instead. So besides the one I first identified, what other related patches should I try backing out?
Just go to the parent of the identified patch. As for the other symptoms you are seeing, I'd love to see logs with the patches we identified so we can root cause.
Thanks.
-Aaron
On Wed, May 13, 2020 at 11:54 PM Furquan Shaikh furquan.m.shaikh@gmail.com wrote:
Similar fix for i440x: https://review.coreboot.org/c/coreboot/+/41368
On Wed, May 13, 2020 at 11:29 AM Aaron Durbin adurbin@google.com wrote: > > i440x chipset is doing things in the wrong way like sandybridge. I uploaded this fix for sandy: https://review.coreboot.org/c/coreboot/+/41364 We'll need to do the equivalent for i440x. > > On Wed, May 13, 2020 at 11:13 AM Aaron Durbin adurbin@google.com wrote: >> >> OK. I'll take a look at your logs and see what's going on. The patch link I sent was based off of someone else's mainboard logs. >> >> On Wed, May 13, 2020 at 10:59 AM Keith Hui buurin@gmail.com wrote: >>> >>> Hi Aaron, >>> >>> It didn't help. There still a way out of whack entry in the coreboot >>> table and e820 entry ending at 000003ffffffffff, which I think have >>> more to do than the 41363's scope. >>> >>> Keith >>> >>> On Wed, May 13, 2020 at 12:24 PM Aaron Durbin adurbin@google.com wrote: >>> > >>> > I think the following patch will fix things up: https://review.coreboot.org/c/coreboot/+/41363 Please let me know. >>> > >>> > On Wed, May 13, 2020 at 8:43 AM Keith Hui buurin@gmail.com wrote: >>> >> >>> >> Thanks Furquan. >>> >> >>> >> Here are 3 logs. Log 1 is at the commit just before the problem. Log 2 >>> >> is at the problem commit. Log 3 is at the current master, if that's >>> >> what you meant by ToT. >>> >> >>> >> I'm using SeaBIOS 1.13.0, compiled once using the attached .config >>> >> before taking these logs. All 3 runs are taken using the same SeaBIOS >>> >> binary. >>> >> >>> >> Then I recompiled SeaBIOS with CONFIG_RELOCATE_INIT off, replaced the >>> >> payload used in run 3, and took an extra run. In this case the board >>> >> reset on its own at "Scanning option roms", looping infinitely. >>> >> >>> >> Hope this helps >>> >> Keith >>> >> >>> >> On Wed, May 13, 2020 at 7:38 AM Furquan Shaikh >>> >> furquan.m.shaikh@gmail.com wrote: >>> >> > >>> >> > Thanks for the report Keith! >>> >> > >>> >> > On Wed, May 13, 2020 at 3:42 AM Paul Menzel pmenzel@molgen.mpg.de wrote: >>> >> > > >>> >> > > Dear Keith, >>> >> > > >>> >> > > >>> >> > > Am 13.05.20 um 05:21 schrieb Keith Hui: >>> >> > > >>> >> > > > I am still refining the P2B family of boards, now including the >>> >> > > > infamous P3B-F with an unusual appetite for hacks to make work. >>> >> > > > >>> >> > > > That said, I'm now finding that, on P3B-F, SeaBIOS hangs when it tries >>> >> > > > to relocate itself as part of its usual chores. Having just learned >>> >> > > > git bisect, I decided to try it out. >>> >> > > > >>> >> > > > It was commit 3b02006afe8a85477dafa1bd149f1f0dba02afc7 [1] that broke >>> >> > > > my SeaBIOS. It doesn't affect my newer toy the P8Z77-M as much as >>> >> > > > P3B-F, but I still want to blame that, and probably the very next >>> >> > > > commit as well, as they both deal with some very modern aspects of PCI >>> >> > > > that well predates the 440BX. >>> >> > > > >>> >> > > > Is there anything we can do to fix 3b02006afe? >>> >> > > >>> >> > > I commented in the change-set [1] to make the author and reviewers aware >>> >> > > of this issue and referenced your list message, and ask to comment here. >>> >> > > >>> >> > > Could you please provide the debug log of coreboot and SeaBIOS? >>> >> > >>> >> > As Paul mentioned, can you please provide the debug logs for coreboot >>> >> > and SeaBIOS both with ToT coreboot and with HEAD set before the change >>> >> > 3b02006afe where it does not hang? Thanks! >>> >> > >>> >> > > >>> >> > > >>> >> > > > Meanwhile I ported the P3B-F board enable to flashrom [2], which got a >>> >> > > > heavy workout during this bisect, through vendor firmware and both >>> >> > > > good and bad builds of coreboot. In all cases I can flash internal, no >>> >> > > > longer having to haul out my P2B-LS just to use it as a flasher. >>> >> > > > >>> >> > > > Enjoy this long overdue board enable. If it gets submitted, I'll >>> >> > > > retract the ramstage hack[3] doing the same as redundant. >>> >> > > >>> >> > > Very nice! It’s always amazing, how after so many years, when the vendor >>> >> > > already stopped supporting the device, the community still supports the >>> >> > > device and improves the firmware showing that Free Software is the more >>> >> > > sustainable way. >>> >> > > >>> >> > > >>> >> > > Kind regards, >>> >> > > >>> >> > > Paul >>> >> > > >>> >> > > >>> >> > > > [1] https://review.coreboot.org/c/coreboot/+/39486 >>> >> > > > [2] https://review.coreboot.org/c/flashrom/+/41354 >>> >> > > > [3] https://review.coreboot.org/c/coreboot/+/41224 >>> >> > > _______________________________________________ >>> >> > > coreboot mailing list -- coreboot@coreboot.org >>> >> > > To unsubscribe send an email to coreboot-leave@coreboot.org >>> >> _______________________________________________ >>> >> coreboot mailing list -- coreboot@coreboot.org >>> >> To unsubscribe send an email to coreboot-leave@coreboot.org
coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org
On Thu, May 14, 2020 at 2:46 PM Mike Banon mikebdp2@gmail.com wrote:
Unfortunately it seems a lot of boards are affected by this. A88XM-E and Lenovo G505S (AMD fam15h) also got broken: they rarely succeed at booting - and, when they do, no boot devices are available (virtual floppies too, for some reason) - except coreinfo/tint secondary payloads which became prone to freezing. I attach the A88XM-E logs I've been able to obtain with USB FT232H:
- ok_e6fb1344ed9188e19be4b54bdf1a76680b8c4523.txt - last coreboot
repo's revision where all the stuff works 2) fail_1_3b02006afe8a85477dafa1bd149f1f0dba02afc7.txt - this commit got the boards broken for the first time 3) fail_2_6b95507ec5b087658178a325bdc68570bc48bb20.txt - this is a log for coreboot's master top
For some reason logs for 2) and 3) always stop after "PCI: 00:12.2 EHCI Debug Port hook triggered".
I hope these commits could be reverted before we figure out what's going on with them. Good thing we've noticed it fast enough.
Thanks, Mike. The amd chipset code (all of it from what I can tell) is fundamentally broken and at odds with all of the resource allocation flow. They worked previously because dynamic resources were being assigned using an algorithm that just assumed there weren't collisions, and that was done w/o all the necessary info required for making the proper decisions regarding dynamic resource allocation.
I landed the other chipsets' fixes, but the amd chipset code is going to take a lot more to fix. Would you be willing to test patches as they are crafted? Given the largeness of the problem as well as the gnarly code that is the amd chipset code it's going to take some time so I think we do need to revert the allocator changes until we can do some house keeping.
-Aaron
Best regards, Mike Banon
On Thu, May 14, 2020 at 8:47 PM Keith Hui buurin@gmail.com wrote:
Hi guys,
31ab7de51a is CB:41368, cherry picked into my local repo.
Turns out I have to back out all four of Furquan's patches (CB:39486~39489) for my board to boot normally again.
Thoughts?
I'll now get a log with everything in at SPEW.
On Thu, May 14, 2020 at 1:05 PM Aaron Durbin adurbin@google.com wrote:
Keith, is it possible to have the console log level set to SPEW? I'm
not seeing the full logs to piece it all together.
Allocating resources... Reading resources... Setting RAM size to 768 MB PNP: 03f0.8 missing read_resources Done reading resources. Resource allocator: DOMAIN: 0000 - Pass 1 (gathering requirements) Resource allocator: DOMAIN: 0000 - Pass 2 (allocating resources) Resource ranges: Base: 1000, Size: d000, Tag: 100 Base: f000, Size: 1000, Tag: 100 Resource ranges: Base: 0, Size: ff800000, Tag: 200 Base: 100000000, Size: f00000000, Tag: 100200 Resource ranges: Base: 10000000, Size: 8000000, Tag: 1200 Resource ranges: Base: 18000000, Size: 1100000, Tag: 200
This is the memory address space: Base: 0, Size: ff800000, Tag: 200 Base: 100000000, Size: f00000000, Tag: 100200
Those are valid ranges to choose dynamic resources from.
PCI: 00:00.0 10 <- [0x0000000000 - 0x000fffffff] size 0x10000000 gran
0x1c prefmem
I see 'Setting RAM size to 768 MB' which means I would expect to see a
hole in the ranges representing 768MiB.
that would be bad. I don't know what commit '31ab7de51a' is, but it
might not contain the CB:41368. Having SPEW logs would be helpful.
Also, what mainboard Kconfig are you selecting for p3bf?
src/mainboard/asus/p2b ?
On Thu, May 14, 2020 at 10:42 AM Keith Hui buurin@gmail.com wrote:
(Temporarily leaving the list out)
Hi Aaron,
Here is a log with everything including CB:41368 included. I'll get this log out to you first, while I try a build with all problem commits left out.
Thanks Keith
On Thu, May 14, 2020 at 12:53 AM Aaron Durbin adurbin@google.com
wrote:
On Wed, May 13, 2020 at 10:51 PM Keith Hui buurin@gmail.com
wrote:
Hi guys,
I tested these fixes on my board, and I have to say there's still something wrong. They did address the hang or reset in SeaBIOS I
first
described, but now either my ATA hard drive failed to boot (it
tried
to hand off to GRUB on my drive, but didn't get there), or it can't find the option ROM of my video card, meaning no display.
Now I want to try the other way, testing a build with all changes related to the problem backed out instead. So besides the one I
first
identified, what other related patches should I try backing out?
Just go to the parent of the identified patch. As for the other
symptoms you are seeing, I'd love to see logs with the patches we identified so we can root cause.
Thanks.
-Aaron
On Wed, May 13, 2020 at 11:54 PM Furquan Shaikh furquan.m.shaikh@gmail.com wrote: > > Similar fix for i440x:
https://review.coreboot.org/c/coreboot/+/41368
> > On Wed, May 13, 2020 at 11:29 AM Aaron Durbin <
adurbin@google.com> wrote:
> > > > i440x chipset is doing things in the wrong way like
sandybridge. I uploaded this fix for sandy: https://review.coreboot.org/c/coreboot/+/41364 We'll need to do the equivalent for i440x.
> > > > On Wed, May 13, 2020 at 11:13 AM Aaron Durbin <
adurbin@google.com> wrote:
> >> > >> OK. I'll take a look at your logs and see what's going on.
The patch link I sent was based off of someone else's mainboard logs.
> >> > >> On Wed, May 13, 2020 at 10:59 AM Keith Hui buurin@gmail.com
wrote:
> >>> > >>> Hi Aaron, > >>> > >>> It didn't help. There still a way out of whack entry in the
coreboot
> >>> table and e820 entry ending at 000003ffffffffff, which I
think have
> >>> more to do than the 41363's scope. > >>> > >>> Keith > >>> > >>> On Wed, May 13, 2020 at 12:24 PM Aaron Durbin <
adurbin@google.com> wrote:
> >>> > > >>> > I think the following patch will fix things up:
https://review.coreboot.org/c/coreboot/+/41363 Please let me know.
> >>> > > >>> > On Wed, May 13, 2020 at 8:43 AM Keith Hui <
buurin@gmail.com> wrote:
> >>> >> > >>> >> Thanks Furquan. > >>> >> > >>> >> Here are 3 logs. Log 1 is at the commit just before the
problem. Log 2
> >>> >> is at the problem commit. Log 3 is at the current master,
if that's
> >>> >> what you meant by ToT. > >>> >> > >>> >> I'm using SeaBIOS 1.13.0, compiled once using the
attached .config
> >>> >> before taking these logs. All 3 runs are taken using the
same SeaBIOS
> >>> >> binary. > >>> >> > >>> >> Then I recompiled SeaBIOS with CONFIG_RELOCATE_INIT off,
replaced the
> >>> >> payload used in run 3, and took an extra run. In this
case the board
> >>> >> reset on its own at "Scanning option roms", looping
infinitely.
> >>> >> > >>> >> Hope this helps > >>> >> Keith > >>> >> > >>> >> On Wed, May 13, 2020 at 7:38 AM Furquan Shaikh > >>> >> furquan.m.shaikh@gmail.com wrote: > >>> >> > > >>> >> > Thanks for the report Keith! > >>> >> > > >>> >> > On Wed, May 13, 2020 at 3:42 AM Paul Menzel <
pmenzel@molgen.mpg.de> wrote:
> >>> >> > > > >>> >> > > Dear Keith, > >>> >> > > > >>> >> > > > >>> >> > > Am 13.05.20 um 05:21 schrieb Keith Hui: > >>> >> > > > >>> >> > > > I am still refining the P2B family of boards, now
including the
> >>> >> > > > infamous P3B-F with an unusual appetite for hacks
to make work.
> >>> >> > > > > >>> >> > > > That said, I'm now finding that, on P3B-F, SeaBIOS
hangs when it tries
> >>> >> > > > to relocate itself as part of its usual chores.
Having just learned
> >>> >> > > > git bisect, I decided to try it out. > >>> >> > > > > >>> >> > > > It was commit
3b02006afe8a85477dafa1bd149f1f0dba02afc7 [1] that broke
> >>> >> > > > my SeaBIOS. It doesn't affect my newer toy the
P8Z77-M as much as
> >>> >> > > > P3B-F, but I still want to blame that, and probably
the very next
> >>> >> > > > commit as well, as they both deal with some very
modern aspects of PCI
> >>> >> > > > that well predates the 440BX. > >>> >> > > > > >>> >> > > > Is there anything we can do to fix 3b02006afe? > >>> >> > > > >>> >> > > I commented in the change-set [1] to make the author
and reviewers aware
> >>> >> > > of this issue and referenced your list message, and
ask to comment here.
> >>> >> > > > >>> >> > > Could you please provide the debug log of coreboot
and SeaBIOS?
> >>> >> > > >>> >> > As Paul mentioned, can you please provide the debug
logs for coreboot
> >>> >> > and SeaBIOS both with ToT coreboot and with HEAD set
before the change
> >>> >> > 3b02006afe where it does not hang? Thanks! > >>> >> > > >>> >> > > > >>> >> > > > >>> >> > > > Meanwhile I ported the P3B-F board enable to
flashrom [2], which got a
> >>> >> > > > heavy workout during this bisect, through vendor
firmware and both
> >>> >> > > > good and bad builds of coreboot. In all cases I can
flash internal, no
> >>> >> > > > longer having to haul out my P2B-LS just to use it
as a flasher.
> >>> >> > > > > >>> >> > > > Enjoy this long overdue board enable. If it gets
submitted, I'll
> >>> >> > > > retract the ramstage hack[3] doing the same as
redundant.
> >>> >> > > > >>> >> > > Very nice! It’s always amazing, how after so many
years, when the vendor
> >>> >> > > already stopped supporting the device, the community
still supports the
> >>> >> > > device and improves the firmware showing that Free
Software is the more
> >>> >> > > sustainable way. > >>> >> > > > >>> >> > > > >>> >> > > Kind regards, > >>> >> > > > >>> >> > > Paul > >>> >> > > > >>> >> > > > >>> >> > > > [1] https://review.coreboot.org/c/coreboot/+/39486 > >>> >> > > > [2] https://review.coreboot.org/c/flashrom/+/41354 > >>> >> > > > [3] https://review.coreboot.org/c/coreboot/+/41224 > >>> >> > > _______________________________________________ > >>> >> > > coreboot mailing list -- coreboot@coreboot.org > >>> >> > > To unsubscribe send an email to
coreboot-leave@coreboot.org
> >>> >> _______________________________________________ > >>> >> coreboot mailing list -- coreboot@coreboot.org > >>> >> To unsubscribe send an email to
coreboot-leave@coreboot.org
coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org
Hi Aaron,
You may want to check the QEMU-q35 target as well:
Automatic boot test returned (PASS/FAIL/TOTAL): 2/2/4 Emulation targets: "QEMU x86 q35/ich9" using payload TianoCore : FAIL : https://lava.9esec.io/r/3427 "QEMU x86 q35/ich9" using payload SeaBIOS : FAIL : https://lava.9esec.io/r/3426 "QEMU x86 i440fx/piix4" using payload SeaBIOS : SUCCESS : https://lava.9esec.io/r/3425 "QEMU AArch64" using payload LinuxBoot_u-root_kexec : SUCCESS : https://lava.9esec.io/r/3424
Thanks Keith
On Thu, May 14, 2020 at 5:47 PM Aaron Durbin adurbin@google.com wrote:
On Thu, May 14, 2020 at 2:46 PM Mike Banon mikebdp2@gmail.com wrote:
Unfortunately it seems a lot of boards are affected by this. A88XM-E and Lenovo G505S (AMD fam15h) also got broken: they rarely succeed at booting - and, when they do, no boot devices are available (virtual floppies too, for some reason) - except coreinfo/tint secondary payloads which became prone to freezing. I attach the A88XM-E logs I've been able to obtain with USB FT232H:
- ok_e6fb1344ed9188e19be4b54bdf1a76680b8c4523.txt - last coreboot
repo's revision where all the stuff works 2) fail_1_3b02006afe8a85477dafa1bd149f1f0dba02afc7.txt - this commit got the boards broken for the first time 3) fail_2_6b95507ec5b087658178a325bdc68570bc48bb20.txt - this is a log for coreboot's master top
For some reason logs for 2) and 3) always stop after "PCI: 00:12.2 EHCI Debug Port hook triggered".
I hope these commits could be reverted before we figure out what's going on with them. Good thing we've noticed it fast enough.
Thanks, Mike. The amd chipset code (all of it from what I can tell) is fundamentally broken and at odds with all of the resource allocation flow. They worked previously because dynamic resources were being assigned using an algorithm that just assumed there weren't collisions, and that was done w/o all the necessary info required for making the proper decisions regarding dynamic resource allocation.
I landed the other chipsets' fixes, but the amd chipset code is going to take a lot more to fix. Would you be willing to test patches as they are crafted? Given the largeness of the problem as well as the gnarly code that is the amd chipset code it's going to take some time so I think we do need to revert the allocator changes until we can do some house keeping.
-Aaron
Best regards, Mike Banon
On Thu, May 14, 2020 at 8:47 PM Keith Hui buurin@gmail.com wrote:
Hi guys,
31ab7de51a is CB:41368, cherry picked into my local repo.
Turns out I have to back out all four of Furquan's patches (CB:39486~39489) for my board to boot normally again.
Thoughts?
I'll now get a log with everything in at SPEW.
On Thu, May 14, 2020 at 1:05 PM Aaron Durbin adurbin@google.com wrote:
Keith, is it possible to have the console log level set to SPEW? I'm not seeing the full logs to piece it all together.
Allocating resources... Reading resources... Setting RAM size to 768 MB PNP: 03f0.8 missing read_resources Done reading resources. Resource allocator: DOMAIN: 0000 - Pass 1 (gathering requirements) Resource allocator: DOMAIN: 0000 - Pass 2 (allocating resources) Resource ranges: Base: 1000, Size: d000, Tag: 100 Base: f000, Size: 1000, Tag: 100 Resource ranges: Base: 0, Size: ff800000, Tag: 200 Base: 100000000, Size: f00000000, Tag: 100200 Resource ranges: Base: 10000000, Size: 8000000, Tag: 1200 Resource ranges: Base: 18000000, Size: 1100000, Tag: 200
This is the memory address space: Base: 0, Size: ff800000, Tag: 200 Base: 100000000, Size: f00000000, Tag: 100200
Those are valid ranges to choose dynamic resources from.
PCI: 00:00.0 10 <- [0x0000000000 - 0x000fffffff] size 0x10000000 gran 0x1c prefmem
I see 'Setting RAM size to 768 MB' which means I would expect to see a hole in the ranges representing 768MiB.
that would be bad. I don't know what commit '31ab7de51a' is, but it might not contain the CB:41368. Having SPEW logs would be helpful.
Also, what mainboard Kconfig are you selecting for p3bf? src/mainboard/asus/p2b ?
On Thu, May 14, 2020 at 10:42 AM Keith Hui buurin@gmail.com wrote:
(Temporarily leaving the list out)
Hi Aaron,
Here is a log with everything including CB:41368 included. I'll get this log out to you first, while I try a build with all problem commits left out.
Thanks Keith
On Thu, May 14, 2020 at 12:53 AM Aaron Durbin adurbin@google.com wrote:
On Wed, May 13, 2020 at 10:51 PM Keith Hui buurin@gmail.com wrote: > > Hi guys, > > I tested these fixes on my board, and I have to say there's still > something wrong. They did address the hang or reset in SeaBIOS I first > described, but now either my ATA hard drive failed to boot (it tried > to hand off to GRUB on my drive, but didn't get there), or it can't > find the option ROM of my video card, meaning no display. > > Now I want to try the other way, testing a build with all changes > related to the problem backed out instead. So besides the one I first > identified, what other related patches should I try backing out?
Just go to the parent of the identified patch. As for the other symptoms you are seeing, I'd love to see logs with the patches we identified so we can root cause.
Thanks.
-Aaron
> > On Wed, May 13, 2020 at 11:54 PM Furquan Shaikh > furquan.m.shaikh@gmail.com wrote: > > > > Similar fix for i440x: https://review.coreboot.org/c/coreboot/+/41368 > > > > On Wed, May 13, 2020 at 11:29 AM Aaron Durbin adurbin@google.com wrote: > > > > > > i440x chipset is doing things in the wrong way like sandybridge. I uploaded this fix for sandy: https://review.coreboot.org/c/coreboot/+/41364 We'll need to do the equivalent for i440x. > > > > > > On Wed, May 13, 2020 at 11:13 AM Aaron Durbin adurbin@google.com wrote: > > >> > > >> OK. I'll take a look at your logs and see what's going on. The patch link I sent was based off of someone else's mainboard logs. > > >> > > >> On Wed, May 13, 2020 at 10:59 AM Keith Hui buurin@gmail.com wrote: > > >>> > > >>> Hi Aaron, > > >>> > > >>> It didn't help. There still a way out of whack entry in the coreboot > > >>> table and e820 entry ending at 000003ffffffffff, which I think have > > >>> more to do than the 41363's scope. > > >>> > > >>> Keith > > >>> > > >>> On Wed, May 13, 2020 at 12:24 PM Aaron Durbin adurbin@google.com wrote: > > >>> > > > >>> > I think the following patch will fix things up: https://review.coreboot.org/c/coreboot/+/41363 Please let me know. > > >>> > > > >>> > On Wed, May 13, 2020 at 8:43 AM Keith Hui buurin@gmail.com wrote: > > >>> >> > > >>> >> Thanks Furquan. > > >>> >> > > >>> >> Here are 3 logs. Log 1 is at the commit just before the problem. Log 2 > > >>> >> is at the problem commit. Log 3 is at the current master, if that's > > >>> >> what you meant by ToT. > > >>> >> > > >>> >> I'm using SeaBIOS 1.13.0, compiled once using the attached .config > > >>> >> before taking these logs. All 3 runs are taken using the same SeaBIOS > > >>> >> binary. > > >>> >> > > >>> >> Then I recompiled SeaBIOS with CONFIG_RELOCATE_INIT off, replaced the > > >>> >> payload used in run 3, and took an extra run. In this case the board > > >>> >> reset on its own at "Scanning option roms", looping infinitely. > > >>> >> > > >>> >> Hope this helps > > >>> >> Keith > > >>> >> > > >>> >> On Wed, May 13, 2020 at 7:38 AM Furquan Shaikh > > >>> >> furquan.m.shaikh@gmail.com wrote: > > >>> >> > > > >>> >> > Thanks for the report Keith! > > >>> >> > > > >>> >> > On Wed, May 13, 2020 at 3:42 AM Paul Menzel pmenzel@molgen.mpg.de wrote: > > >>> >> > > > > >>> >> > > Dear Keith, > > >>> >> > > > > >>> >> > > > > >>> >> > > Am 13.05.20 um 05:21 schrieb Keith Hui: > > >>> >> > > > > >>> >> > > > I am still refining the P2B family of boards, now including the > > >>> >> > > > infamous P3B-F with an unusual appetite for hacks to make work. > > >>> >> > > > > > >>> >> > > > That said, I'm now finding that, on P3B-F, SeaBIOS hangs when it tries > > >>> >> > > > to relocate itself as part of its usual chores. Having just learned > > >>> >> > > > git bisect, I decided to try it out. > > >>> >> > > > > > >>> >> > > > It was commit 3b02006afe8a85477dafa1bd149f1f0dba02afc7 [1] that broke > > >>> >> > > > my SeaBIOS. It doesn't affect my newer toy the P8Z77-M as much as > > >>> >> > > > P3B-F, but I still want to blame that, and probably the very next > > >>> >> > > > commit as well, as they both deal with some very modern aspects of PCI > > >>> >> > > > that well predates the 440BX. > > >>> >> > > > > > >>> >> > > > Is there anything we can do to fix 3b02006afe? > > >>> >> > > > > >>> >> > > I commented in the change-set [1] to make the author and reviewers aware > > >>> >> > > of this issue and referenced your list message, and ask to comment here. > > >>> >> > > > > >>> >> > > Could you please provide the debug log of coreboot and SeaBIOS? > > >>> >> > > > >>> >> > As Paul mentioned, can you please provide the debug logs for coreboot > > >>> >> > and SeaBIOS both with ToT coreboot and with HEAD set before the change > > >>> >> > 3b02006afe where it does not hang? Thanks! > > >>> >> > > > >>> >> > > > > >>> >> > > > > >>> >> > > > Meanwhile I ported the P3B-F board enable to flashrom [2], which got a > > >>> >> > > > heavy workout during this bisect, through vendor firmware and both > > >>> >> > > > good and bad builds of coreboot. In all cases I can flash internal, no > > >>> >> > > > longer having to haul out my P2B-LS just to use it as a flasher. > > >>> >> > > > > > >>> >> > > > Enjoy this long overdue board enable. If it gets submitted, I'll > > >>> >> > > > retract the ramstage hack[3] doing the same as redundant. > > >>> >> > > > > >>> >> > > Very nice! It’s always amazing, how after so many years, when the vendor > > >>> >> > > already stopped supporting the device, the community still supports the > > >>> >> > > device and improves the firmware showing that Free Software is the more > > >>> >> > > sustainable way. > > >>> >> > > > > >>> >> > > > > >>> >> > > Kind regards, > > >>> >> > > > > >>> >> > > Paul > > >>> >> > > > > >>> >> > > > > >>> >> > > > [1] https://review.coreboot.org/c/coreboot/+/39486 > > >>> >> > > > [2] https://review.coreboot.org/c/flashrom/+/41354 > > >>> >> > > > [3] https://review.coreboot.org/c/coreboot/+/41224 > > >>> >> > > _______________________________________________ > > >>> >> > > coreboot mailing list -- coreboot@coreboot.org > > >>> >> > > To unsubscribe send an email to coreboot-leave@coreboot.org > > >>> >> _______________________________________________ > > >>> >> coreboot mailing list -- coreboot@coreboot.org > > >>> >> To unsubscribe send an email to coreboot-leave@coreboot.org
coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org
On Thu, May 14, 2020 at 4:44 PM Keith Hui buurin@gmail.com wrote:
Hi Aaron,
You may want to check the QEMU-q35 target as well:
Automatic boot test returned (PASS/FAIL/TOTAL): 2/2/4 Emulation targets: "QEMU x86 q35/ich9" using payload TianoCore : FAIL : https://lava.9esec.io/r/3427 "QEMU x86 q35/ich9" using payload SeaBIOS : FAIL : https://lava.9esec.io/r/3426 "QEMU x86 i440fx/piix4" using payload SeaBIOS : SUCCESS : https://lava.9esec.io/r/3425 "QEMU AArch64" using payload LinuxBoot_u-root_kexec : SUCCESS : https://lava.9esec.io/r/3424
Ya. Those are fixed from the patches that fixed your device, Keith.
Thanks Keith
On Thu, May 14, 2020 at 5:47 PM Aaron Durbin adurbin@google.com wrote:
On Thu, May 14, 2020 at 2:46 PM Mike Banon mikebdp2@gmail.com wrote:
Unfortunately it seems a lot of boards are affected by this. A88XM-E and Lenovo G505S (AMD fam15h) also got broken: they rarely succeed at booting - and, when they do, no boot devices are available (virtual floppies too, for some reason) - except coreinfo/tint secondary payloads which became prone to freezing. I attach the A88XM-E logs I've been able to obtain with USB FT232H:
- ok_e6fb1344ed9188e19be4b54bdf1a76680b8c4523.txt - last coreboot
repo's revision where all the stuff works 2) fail_1_3b02006afe8a85477dafa1bd149f1f0dba02afc7.txt - this commit got the boards broken for the first time 3) fail_2_6b95507ec5b087658178a325bdc68570bc48bb20.txt - this is a log for coreboot's master top
For some reason logs for 2) and 3) always stop after "PCI: 00:12.2 EHCI Debug Port hook triggered".
I hope these commits could be reverted before we figure out what's going on with them. Good thing we've noticed it fast enough.
Thanks, Mike. The amd chipset code (all of it from what I can tell) is
fundamentally broken and at odds with all of the resource allocation flow. They worked previously because dynamic resources were being assigned using an algorithm that just assumed there weren't collisions, and that was done w/o all the necessary info required for making the proper decisions regarding dynamic resource allocation.
I landed the other chipsets' fixes, but the amd chipset code is going to
take a lot more to fix. Would you be willing to test patches as they are crafted? Given the largeness of the problem as well as the gnarly code that is the amd chipset code it's going to take some time so I think we do need to revert the allocator changes until we can do some house keeping.
-Aaron
Best regards, Mike Banon
On Thu, May 14, 2020 at 8:47 PM Keith Hui buurin@gmail.com wrote:
Hi guys,
31ab7de51a is CB:41368, cherry picked into my local repo.
Turns out I have to back out all four of Furquan's patches (CB:39486~39489) for my board to boot normally again.
Thoughts?
I'll now get a log with everything in at SPEW.
On Thu, May 14, 2020 at 1:05 PM Aaron Durbin adurbin@google.com
wrote:
Keith, is it possible to have the console log level set to SPEW?
I'm not seeing the full logs to piece it all together.
Allocating resources... Reading resources... Setting RAM size to 768 MB PNP: 03f0.8 missing read_resources Done reading resources. Resource allocator: DOMAIN: 0000 - Pass 1 (gathering requirements) Resource allocator: DOMAIN: 0000 - Pass 2 (allocating resources) Resource ranges: Base: 1000, Size: d000, Tag: 100 Base: f000, Size: 1000, Tag: 100 Resource ranges: Base: 0, Size: ff800000, Tag: 200 Base: 100000000, Size: f00000000, Tag: 100200 Resource ranges: Base: 10000000, Size: 8000000, Tag: 1200 Resource ranges: Base: 18000000, Size: 1100000, Tag: 200
This is the memory address space: Base: 0, Size: ff800000, Tag: 200 Base: 100000000, Size: f00000000, Tag: 100200
Those are valid ranges to choose dynamic resources from.
PCI: 00:00.0 10 <- [0x0000000000 - 0x000fffffff] size 0x10000000
gran 0x1c prefmem
I see 'Setting RAM size to 768 MB' which means I would expect to
see a hole in the ranges representing 768MiB.
that would be bad. I don't know what commit '31ab7de51a' is, but it
might not contain the CB:41368. Having SPEW logs would be helpful.
Also, what mainboard Kconfig are you selecting for p3bf?
src/mainboard/asus/p2b ?
On Thu, May 14, 2020 at 10:42 AM Keith Hui buurin@gmail.com
wrote:
(Temporarily leaving the list out)
Hi Aaron,
Here is a log with everything including CB:41368 included. I'll get this log out to you first, while I try a build with all problem commits left out.
Thanks Keith
On Thu, May 14, 2020 at 12:53 AM Aaron Durbin adurbin@google.com
wrote:
> > > > On Wed, May 13, 2020 at 10:51 PM Keith Hui buurin@gmail.com
wrote:
>> >> Hi guys, >> >> I tested these fixes on my board, and I have to say there's
still
>> something wrong. They did address the hang or reset in SeaBIOS
I first
>> described, but now either my ATA hard drive failed to boot (it
tried
>> to hand off to GRUB on my drive, but didn't get there), or it
can't
>> find the option ROM of my video card, meaning no display. >> >> Now I want to try the other way, testing a build with all
changes
>> related to the problem backed out instead. So besides the one I
first
>> identified, what other related patches should I try backing out? > > > Just go to the parent of the identified patch. As for the other
symptoms you are seeing, I'd love to see logs with the patches we identified so we can root cause.
> > Thanks. > > -Aaron > >> >> On Wed, May 13, 2020 at 11:54 PM Furquan Shaikh >> furquan.m.shaikh@gmail.com wrote: >> > >> > Similar fix for i440x:
https://review.coreboot.org/c/coreboot/+/41368
>> > >> > On Wed, May 13, 2020 at 11:29 AM Aaron Durbin <
adurbin@google.com> wrote:
>> > > >> > > i440x chipset is doing things in the wrong way like
sandybridge. I uploaded this fix for sandy: https://review.coreboot.org/c/coreboot/+/41364 We'll need to do the equivalent for i440x.
>> > > >> > > On Wed, May 13, 2020 at 11:13 AM Aaron Durbin <
adurbin@google.com> wrote:
>> > >> >> > >> OK. I'll take a look at your logs and see what's going on.
The patch link I sent was based off of someone else's mainboard logs.
>> > >> >> > >> On Wed, May 13, 2020 at 10:59 AM Keith Hui <
buurin@gmail.com> wrote:
>> > >>> >> > >>> Hi Aaron, >> > >>> >> > >>> It didn't help. There still a way out of whack entry in
the coreboot
>> > >>> table and e820 entry ending at 000003ffffffffff, which I
think have
>> > >>> more to do than the 41363's scope. >> > >>> >> > >>> Keith >> > >>> >> > >>> On Wed, May 13, 2020 at 12:24 PM Aaron Durbin <
adurbin@google.com> wrote:
>> > >>> > >> > >>> > I think the following patch will fix things up:
https://review.coreboot.org/c/coreboot/+/41363 Please let me know.
>> > >>> > >> > >>> > On Wed, May 13, 2020 at 8:43 AM Keith Hui <
buurin@gmail.com> wrote:
>> > >>> >> >> > >>> >> Thanks Furquan. >> > >>> >> >> > >>> >> Here are 3 logs. Log 1 is at the commit just before
the problem. Log 2
>> > >>> >> is at the problem commit. Log 3 is at the current
master, if that's
>> > >>> >> what you meant by ToT. >> > >>> >> >> > >>> >> I'm using SeaBIOS 1.13.0, compiled once using the
attached .config
>> > >>> >> before taking these logs. All 3 runs are taken using
the same SeaBIOS
>> > >>> >> binary. >> > >>> >> >> > >>> >> Then I recompiled SeaBIOS with CONFIG_RELOCATE_INIT
off, replaced the
>> > >>> >> payload used in run 3, and took an extra run. In this
case the board
>> > >>> >> reset on its own at "Scanning option roms", looping
infinitely.
>> > >>> >> >> > >>> >> Hope this helps >> > >>> >> Keith >> > >>> >> >> > >>> >> On Wed, May 13, 2020 at 7:38 AM Furquan Shaikh >> > >>> >> furquan.m.shaikh@gmail.com wrote: >> > >>> >> > >> > >>> >> > Thanks for the report Keith! >> > >>> >> > >> > >>> >> > On Wed, May 13, 2020 at 3:42 AM Paul Menzel <
pmenzel@molgen.mpg.de> wrote:
>> > >>> >> > > >> > >>> >> > > Dear Keith, >> > >>> >> > > >> > >>> >> > > >> > >>> >> > > Am 13.05.20 um 05:21 schrieb Keith Hui: >> > >>> >> > > >> > >>> >> > > > I am still refining the P2B family of boards,
now including the
>> > >>> >> > > > infamous P3B-F with an unusual appetite for
hacks to make work.
>> > >>> >> > > > >> > >>> >> > > > That said, I'm now finding that, on P3B-F,
SeaBIOS hangs when it tries
>> > >>> >> > > > to relocate itself as part of its usual chores.
Having just learned
>> > >>> >> > > > git bisect, I decided to try it out. >> > >>> >> > > > >> > >>> >> > > > It was commit
3b02006afe8a85477dafa1bd149f1f0dba02afc7 [1] that broke
>> > >>> >> > > > my SeaBIOS. It doesn't affect my newer toy the
P8Z77-M as much as
>> > >>> >> > > > P3B-F, but I still want to blame that, and
probably the very next
>> > >>> >> > > > commit as well, as they both deal with some very
modern aspects of PCI
>> > >>> >> > > > that well predates the 440BX. >> > >>> >> > > > >> > >>> >> > > > Is there anything we can do to fix 3b02006afe? >> > >>> >> > > >> > >>> >> > > I commented in the change-set [1] to make the
author and reviewers aware
>> > >>> >> > > of this issue and referenced your list message,
and ask to comment here.
>> > >>> >> > > >> > >>> >> > > Could you please provide the debug log of coreboot
and SeaBIOS?
>> > >>> >> > >> > >>> >> > As Paul mentioned, can you please provide the debug
logs for coreboot
>> > >>> >> > and SeaBIOS both with ToT coreboot and with HEAD set
before the change
>> > >>> >> > 3b02006afe where it does not hang? Thanks! >> > >>> >> > >> > >>> >> > > >> > >>> >> > > >> > >>> >> > > > Meanwhile I ported the P3B-F board enable to
flashrom [2], which got a
>> > >>> >> > > > heavy workout during this bisect, through vendor
firmware and both
>> > >>> >> > > > good and bad builds of coreboot. In all cases I
can flash internal, no
>> > >>> >> > > > longer having to haul out my P2B-LS just to use
it as a flasher.
>> > >>> >> > > > >> > >>> >> > > > Enjoy this long overdue board enable. If it gets
submitted, I'll
>> > >>> >> > > > retract the ramstage hack[3] doing the same as
redundant.
>> > >>> >> > > >> > >>> >> > > Very nice! It’s always amazing, how after so many
years, when the vendor
>> > >>> >> > > already stopped supporting the device, the
community still supports the
>> > >>> >> > > device and improves the firmware showing that Free
Software is the more
>> > >>> >> > > sustainable way. >> > >>> >> > > >> > >>> >> > > >> > >>> >> > > Kind regards, >> > >>> >> > > >> > >>> >> > > Paul >> > >>> >> > > >> > >>> >> > > >> > >>> >> > > > [1]
https://review.coreboot.org/c/coreboot/+/39486
>> > >>> >> > > > [2]
https://review.coreboot.org/c/flashrom/+/41354
>> > >>> >> > > > [3]
https://review.coreboot.org/c/coreboot/+/41224
>> > >>> >> > > _______________________________________________ >> > >>> >> > > coreboot mailing list -- coreboot@coreboot.org >> > >>> >> > > To unsubscribe send an email to
coreboot-leave@coreboot.org
>> > >>> >> _______________________________________________ >> > >>> >> coreboot mailing list -- coreboot@coreboot.org >> > >>> >> To unsubscribe send an email to
coreboot-leave@coreboot.org
coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org
On Thu, May 14, 2020 at 3:46 PM Aaron Durbin adurbin@google.com wrote:
On Thu, May 14, 2020 at 2:46 PM Mike Banon mikebdp2@gmail.com wrote:
Unfortunately it seems a lot of boards are affected by this. A88XM-E and Lenovo G505S (AMD fam15h) also got broken: they rarely succeed at booting - and, when they do, no boot devices are available (virtual floppies too, for some reason) - except coreinfo/tint secondary payloads which became prone to freezing. I attach the A88XM-E logs I've been able to obtain with USB FT232H:
- ok_e6fb1344ed9188e19be4b54bdf1a76680b8c4523.txt - last coreboot
repo's revision where all the stuff works 2) fail_1_3b02006afe8a85477dafa1bd149f1f0dba02afc7.txt - this commit got the boards broken for the first time 3) fail_2_6b95507ec5b087658178a325bdc68570bc48bb20.txt - this is a log for coreboot's master top
For some reason logs for 2) and 3) always stop after "PCI: 00:12.2 EHCI Debug Port hook triggered".
I hope these commits could be reverted before we figure out what's going on with them. Good thing we've noticed it fast enough.
Thanks, Mike. The amd chipset code (all of it from what I can tell) is fundamentally broken and at odds with all of the resource allocation flow. They worked previously because dynamic resources were being assigned using an algorithm that just assumed there weren't collisions, and that was done w/o all the necessary info required for making the proper decisions regarding dynamic resource allocation.
I landed the other chipsets' fixes, but the amd chipset code is going to take a lot more to fix. Would you be willing to test patches as they are crafted? Given the largeness of the problem as well as the gnarly code that is the amd chipset code it's going to take some time so I think we do need to revert the allocator changes until we can do some house keeping.
I just was brainstorming with Furquan. He did push the revert changes, but we were scheming on a patch that I was hoping affected parties could try in conjunction with https://review.coreboot.org/c/coreboot/+/41363. Basically we'll allocate top down like the previous allocator did hoping for no collisions. Let's try that, and see where we land. Regardless we need to fix this amd chipset code as it's a major liability.
-Aaron
Best regards, Mike Banon
On Thu, May 14, 2020 at 8:47 PM Keith Hui buurin@gmail.com wrote:
Hi guys,
31ab7de51a is CB:41368, cherry picked into my local repo.
Turns out I have to back out all four of Furquan's patches (CB:39486~39489) for my board to boot normally again.
Thoughts?
I'll now get a log with everything in at SPEW.
On Thu, May 14, 2020 at 1:05 PM Aaron Durbin adurbin@google.com
wrote:
Keith, is it possible to have the console log level set to SPEW? I'm
not seeing the full logs to piece it all together.
Allocating resources... Reading resources... Setting RAM size to 768 MB PNP: 03f0.8 missing read_resources Done reading resources. Resource allocator: DOMAIN: 0000 - Pass 1 (gathering requirements) Resource allocator: DOMAIN: 0000 - Pass 2 (allocating resources) Resource ranges: Base: 1000, Size: d000, Tag: 100 Base: f000, Size: 1000, Tag: 100 Resource ranges: Base: 0, Size: ff800000, Tag: 200 Base: 100000000, Size: f00000000, Tag: 100200 Resource ranges: Base: 10000000, Size: 8000000, Tag: 1200 Resource ranges: Base: 18000000, Size: 1100000, Tag: 200
This is the memory address space: Base: 0, Size: ff800000, Tag: 200 Base: 100000000, Size: f00000000, Tag: 100200
Those are valid ranges to choose dynamic resources from.
PCI: 00:00.0 10 <- [0x0000000000 - 0x000fffffff] size 0x10000000 gran
0x1c prefmem
I see 'Setting RAM size to 768 MB' which means I would expect to see
a hole in the ranges representing 768MiB.
that would be bad. I don't know what commit '31ab7de51a' is, but it
might not contain the CB:41368. Having SPEW logs would be helpful.
Also, what mainboard Kconfig are you selecting for p3bf?
src/mainboard/asus/p2b ?
On Thu, May 14, 2020 at 10:42 AM Keith Hui buurin@gmail.com wrote:
(Temporarily leaving the list out)
Hi Aaron,
Here is a log with everything including CB:41368 included. I'll get this log out to you first, while I try a build with all problem commits left out.
Thanks Keith
On Thu, May 14, 2020 at 12:53 AM Aaron Durbin adurbin@google.com
wrote:
On Wed, May 13, 2020 at 10:51 PM Keith Hui buurin@gmail.com
wrote:
> > Hi guys, > > I tested these fixes on my board, and I have to say there's still > something wrong. They did address the hang or reset in SeaBIOS I
first
> described, but now either my ATA hard drive failed to boot (it
tried
> to hand off to GRUB on my drive, but didn't get there), or it
can't
> find the option ROM of my video card, meaning no display. > > Now I want to try the other way, testing a build with all changes > related to the problem backed out instead. So besides the one I
first
> identified, what other related patches should I try backing out?
Just go to the parent of the identified patch. As for the other
symptoms you are seeing, I'd love to see logs with the patches we identified so we can root cause.
Thanks.
-Aaron
> > On Wed, May 13, 2020 at 11:54 PM Furquan Shaikh > furquan.m.shaikh@gmail.com wrote: > > > > Similar fix for i440x:
https://review.coreboot.org/c/coreboot/+/41368
> > > > On Wed, May 13, 2020 at 11:29 AM Aaron Durbin <
adurbin@google.com> wrote:
> > > > > > i440x chipset is doing things in the wrong way like
sandybridge. I uploaded this fix for sandy: https://review.coreboot.org/c/coreboot/+/41364 We'll need to do the equivalent for i440x.
> > > > > > On Wed, May 13, 2020 at 11:13 AM Aaron Durbin <
adurbin@google.com> wrote:
> > >> > > >> OK. I'll take a look at your logs and see what's going on.
The patch link I sent was based off of someone else's mainboard logs.
> > >> > > >> On Wed, May 13, 2020 at 10:59 AM Keith Hui buurin@gmail.com
wrote:
> > >>> > > >>> Hi Aaron, > > >>> > > >>> It didn't help. There still a way out of whack entry in the
coreboot
> > >>> table and e820 entry ending at 000003ffffffffff, which I
think have
> > >>> more to do than the 41363's scope. > > >>> > > >>> Keith > > >>> > > >>> On Wed, May 13, 2020 at 12:24 PM Aaron Durbin <
adurbin@google.com> wrote:
> > >>> > > > >>> > I think the following patch will fix things up:
https://review.coreboot.org/c/coreboot/+/41363 Please let me know.
> > >>> > > > >>> > On Wed, May 13, 2020 at 8:43 AM Keith Hui <
buurin@gmail.com> wrote:
> > >>> >> > > >>> >> Thanks Furquan. > > >>> >> > > >>> >> Here are 3 logs. Log 1 is at the commit just before the
problem. Log 2
> > >>> >> is at the problem commit. Log 3 is at the current
master, if that's
> > >>> >> what you meant by ToT. > > >>> >> > > >>> >> I'm using SeaBIOS 1.13.0, compiled once using the
attached .config
> > >>> >> before taking these logs. All 3 runs are taken using the
same SeaBIOS
> > >>> >> binary. > > >>> >> > > >>> >> Then I recompiled SeaBIOS with CONFIG_RELOCATE_INIT off,
replaced the
> > >>> >> payload used in run 3, and took an extra run. In this
case the board
> > >>> >> reset on its own at "Scanning option roms", looping
infinitely.
> > >>> >> > > >>> >> Hope this helps > > >>> >> Keith > > >>> >> > > >>> >> On Wed, May 13, 2020 at 7:38 AM Furquan Shaikh > > >>> >> furquan.m.shaikh@gmail.com wrote: > > >>> >> > > > >>> >> > Thanks for the report Keith! > > >>> >> > > > >>> >> > On Wed, May 13, 2020 at 3:42 AM Paul Menzel <
pmenzel@molgen.mpg.de> wrote:
> > >>> >> > > > > >>> >> > > Dear Keith, > > >>> >> > > > > >>> >> > > > > >>> >> > > Am 13.05.20 um 05:21 schrieb Keith Hui: > > >>> >> > > > > >>> >> > > > I am still refining the P2B family of boards, now
including the
> > >>> >> > > > infamous P3B-F with an unusual appetite for hacks
to make work.
> > >>> >> > > > > > >>> >> > > > That said, I'm now finding that, on P3B-F, SeaBIOS
hangs when it tries
> > >>> >> > > > to relocate itself as part of its usual chores.
Having just learned
> > >>> >> > > > git bisect, I decided to try it out. > > >>> >> > > > > > >>> >> > > > It was commit
3b02006afe8a85477dafa1bd149f1f0dba02afc7 [1] that broke
> > >>> >> > > > my SeaBIOS. It doesn't affect my newer toy the
P8Z77-M as much as
> > >>> >> > > > P3B-F, but I still want to blame that, and
probably the very next
> > >>> >> > > > commit as well, as they both deal with some very
modern aspects of PCI
> > >>> >> > > > that well predates the 440BX. > > >>> >> > > > > > >>> >> > > > Is there anything we can do to fix 3b02006afe? > > >>> >> > > > > >>> >> > > I commented in the change-set [1] to make the author
and reviewers aware
> > >>> >> > > of this issue and referenced your list message, and
ask to comment here.
> > >>> >> > > > > >>> >> > > Could you please provide the debug log of coreboot
and SeaBIOS?
> > >>> >> > > > >>> >> > As Paul mentioned, can you please provide the debug
logs for coreboot
> > >>> >> > and SeaBIOS both with ToT coreboot and with HEAD set
before the change
> > >>> >> > 3b02006afe where it does not hang? Thanks! > > >>> >> > > > >>> >> > > > > >>> >> > > > > >>> >> > > > Meanwhile I ported the P3B-F board enable to
flashrom [2], which got a
> > >>> >> > > > heavy workout during this bisect, through vendor
firmware and both
> > >>> >> > > > good and bad builds of coreboot. In all cases I
can flash internal, no
> > >>> >> > > > longer having to haul out my P2B-LS just to use it
as a flasher.
> > >>> >> > > > > > >>> >> > > > Enjoy this long overdue board enable. If it gets
submitted, I'll
> > >>> >> > > > retract the ramstage hack[3] doing the same as
redundant.
> > >>> >> > > > > >>> >> > > Very nice! It’s always amazing, how after so many
years, when the vendor
> > >>> >> > > already stopped supporting the device, the community
still supports the
> > >>> >> > > device and improves the firmware showing that Free
Software is the more
> > >>> >> > > sustainable way. > > >>> >> > > > > >>> >> > > > > >>> >> > > Kind regards, > > >>> >> > > > > >>> >> > > Paul > > >>> >> > > > > >>> >> > > > > >>> >> > > > [1] https://review.coreboot.org/c/coreboot/+/39486 > > >>> >> > > > [2] https://review.coreboot.org/c/flashrom/+/41354 > > >>> >> > > > [3] https://review.coreboot.org/c/coreboot/+/41224 > > >>> >> > > _______________________________________________ > > >>> >> > > coreboot mailing list -- coreboot@coreboot.org > > >>> >> > > To unsubscribe send an email to
coreboot-leave@coreboot.org
> > >>> >> _______________________________________________ > > >>> >> coreboot mailing list -- coreboot@coreboot.org > > >>> >> To unsubscribe send an email to
coreboot-leave@coreboot.org
coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org
I have uploaded 2 changes on top of Aaron's change. Can you please give these three changes a try: https://review.coreboot.org/c/coreboot/+/41363 https://review.coreboot.org/c/coreboot/+/41418 https://review.coreboot.org/c/coreboot/+/41419
Thank you!
- Furquan
On Thu, May 14, 2020 at 4:16 PM Aaron Durbin adurbin@google.com wrote:
On Thu, May 14, 2020 at 3:46 PM Aaron Durbin adurbin@google.com wrote:
On Thu, May 14, 2020 at 2:46 PM Mike Banon mikebdp2@gmail.com wrote:
Unfortunately it seems a lot of boards are affected by this. A88XM-E and Lenovo G505S (AMD fam15h) also got broken: they rarely succeed at booting - and, when they do, no boot devices are available (virtual floppies too, for some reason) - except coreinfo/tint secondary payloads which became prone to freezing. I attach the A88XM-E logs I've been able to obtain with USB FT232H:
- ok_e6fb1344ed9188e19be4b54bdf1a76680b8c4523.txt - last coreboot
repo's revision where all the stuff works 2) fail_1_3b02006afe8a85477dafa1bd149f1f0dba02afc7.txt - this commit got the boards broken for the first time 3) fail_2_6b95507ec5b087658178a325bdc68570bc48bb20.txt - this is a log for coreboot's master top
For some reason logs for 2) and 3) always stop after "PCI: 00:12.2 EHCI Debug Port hook triggered".
I hope these commits could be reverted before we figure out what's going on with them. Good thing we've noticed it fast enough.
Thanks, Mike. The amd chipset code (all of it from what I can tell) is fundamentally broken and at odds with all of the resource allocation flow. They worked previously because dynamic resources were being assigned using an algorithm that just assumed there weren't collisions, and that was done w/o all the necessary info required for making the proper decisions regarding dynamic resource allocation.
I landed the other chipsets' fixes, but the amd chipset code is going to take a lot more to fix. Would you be willing to test patches as they are crafted? Given the largeness of the problem as well as the gnarly code that is the amd chipset code it's going to take some time so I think we do need to revert the allocator changes until we can do some house keeping.
I just was brainstorming with Furquan. He did push the revert changes, but we were scheming on a patch that I was hoping affected parties could try in conjunction with https://review.coreboot.org/c/coreboot/+/41363. Basically we'll allocate top down like the previous allocator did hoping for no collisions. Let's try that, and see where we land. Regardless we need to fix this amd chipset code as it's a major liability.
-Aaron
Best regards, Mike Banon
On Thu, May 14, 2020 at 8:47 PM Keith Hui buurin@gmail.com wrote:
Hi guys,
31ab7de51a is CB:41368, cherry picked into my local repo.
Turns out I have to back out all four of Furquan's patches (CB:39486~39489) for my board to boot normally again.
Thoughts?
I'll now get a log with everything in at SPEW.
On Thu, May 14, 2020 at 1:05 PM Aaron Durbin adurbin@google.com wrote:
Keith, is it possible to have the console log level set to SPEW? I'm not seeing the full logs to piece it all together.
Allocating resources... Reading resources... Setting RAM size to 768 MB PNP: 03f0.8 missing read_resources Done reading resources. Resource allocator: DOMAIN: 0000 - Pass 1 (gathering requirements) Resource allocator: DOMAIN: 0000 - Pass 2 (allocating resources) Resource ranges: Base: 1000, Size: d000, Tag: 100 Base: f000, Size: 1000, Tag: 100 Resource ranges: Base: 0, Size: ff800000, Tag: 200 Base: 100000000, Size: f00000000, Tag: 100200 Resource ranges: Base: 10000000, Size: 8000000, Tag: 1200 Resource ranges: Base: 18000000, Size: 1100000, Tag: 200
This is the memory address space: Base: 0, Size: ff800000, Tag: 200 Base: 100000000, Size: f00000000, Tag: 100200
Those are valid ranges to choose dynamic resources from.
PCI: 00:00.0 10 <- [0x0000000000 - 0x000fffffff] size 0x10000000 gran 0x1c prefmem
I see 'Setting RAM size to 768 MB' which means I would expect to see a hole in the ranges representing 768MiB.
that would be bad. I don't know what commit '31ab7de51a' is, but it might not contain the CB:41368. Having SPEW logs would be helpful.
Also, what mainboard Kconfig are you selecting for p3bf? src/mainboard/asus/p2b ?
On Thu, May 14, 2020 at 10:42 AM Keith Hui buurin@gmail.com wrote:
(Temporarily leaving the list out)
Hi Aaron,
Here is a log with everything including CB:41368 included. I'll get this log out to you first, while I try a build with all problem commits left out.
Thanks Keith
On Thu, May 14, 2020 at 12:53 AM Aaron Durbin adurbin@google.com wrote: > > > > On Wed, May 13, 2020 at 10:51 PM Keith Hui buurin@gmail.com wrote: >> >> Hi guys, >> >> I tested these fixes on my board, and I have to say there's still >> something wrong. They did address the hang or reset in SeaBIOS I first >> described, but now either my ATA hard drive failed to boot (it tried >> to hand off to GRUB on my drive, but didn't get there), or it can't >> find the option ROM of my video card, meaning no display. >> >> Now I want to try the other way, testing a build with all changes >> related to the problem backed out instead. So besides the one I first >> identified, what other related patches should I try backing out? > > > Just go to the parent of the identified patch. As for the other symptoms you are seeing, I'd love to see logs with the patches we identified so we can root cause. > > Thanks. > > -Aaron > >> >> On Wed, May 13, 2020 at 11:54 PM Furquan Shaikh >> furquan.m.shaikh@gmail.com wrote: >> > >> > Similar fix for i440x: https://review.coreboot.org/c/coreboot/+/41368 >> > >> > On Wed, May 13, 2020 at 11:29 AM Aaron Durbin adurbin@google.com wrote: >> > > >> > > i440x chipset is doing things in the wrong way like sandybridge. I uploaded this fix for sandy: https://review.coreboot.org/c/coreboot/+/41364 We'll need to do the equivalent for i440x. >> > > >> > > On Wed, May 13, 2020 at 11:13 AM Aaron Durbin adurbin@google.com wrote: >> > >> >> > >> OK. I'll take a look at your logs and see what's going on. The patch link I sent was based off of someone else's mainboard logs. >> > >> >> > >> On Wed, May 13, 2020 at 10:59 AM Keith Hui buurin@gmail.com wrote: >> > >>> >> > >>> Hi Aaron, >> > >>> >> > >>> It didn't help. There still a way out of whack entry in the coreboot >> > >>> table and e820 entry ending at 000003ffffffffff, which I think have >> > >>> more to do than the 41363's scope. >> > >>> >> > >>> Keith >> > >>> >> > >>> On Wed, May 13, 2020 at 12:24 PM Aaron Durbin adurbin@google.com wrote: >> > >>> > >> > >>> > I think the following patch will fix things up: https://review.coreboot.org/c/coreboot/+/41363 Please let me know. >> > >>> > >> > >>> > On Wed, May 13, 2020 at 8:43 AM Keith Hui buurin@gmail.com wrote: >> > >>> >> >> > >>> >> Thanks Furquan. >> > >>> >> >> > >>> >> Here are 3 logs. Log 1 is at the commit just before the problem. Log 2 >> > >>> >> is at the problem commit. Log 3 is at the current master, if that's >> > >>> >> what you meant by ToT. >> > >>> >> >> > >>> >> I'm using SeaBIOS 1.13.0, compiled once using the attached .config >> > >>> >> before taking these logs. All 3 runs are taken using the same SeaBIOS >> > >>> >> binary. >> > >>> >> >> > >>> >> Then I recompiled SeaBIOS with CONFIG_RELOCATE_INIT off, replaced the >> > >>> >> payload used in run 3, and took an extra run. In this case the board >> > >>> >> reset on its own at "Scanning option roms", looping infinitely. >> > >>> >> >> > >>> >> Hope this helps >> > >>> >> Keith >> > >>> >> >> > >>> >> On Wed, May 13, 2020 at 7:38 AM Furquan Shaikh >> > >>> >> furquan.m.shaikh@gmail.com wrote: >> > >>> >> > >> > >>> >> > Thanks for the report Keith! >> > >>> >> > >> > >>> >> > On Wed, May 13, 2020 at 3:42 AM Paul Menzel pmenzel@molgen.mpg.de wrote: >> > >>> >> > > >> > >>> >> > > Dear Keith, >> > >>> >> > > >> > >>> >> > > >> > >>> >> > > Am 13.05.20 um 05:21 schrieb Keith Hui: >> > >>> >> > > >> > >>> >> > > > I am still refining the P2B family of boards, now including the >> > >>> >> > > > infamous P3B-F with an unusual appetite for hacks to make work. >> > >>> >> > > > >> > >>> >> > > > That said, I'm now finding that, on P3B-F, SeaBIOS hangs when it tries >> > >>> >> > > > to relocate itself as part of its usual chores. Having just learned >> > >>> >> > > > git bisect, I decided to try it out. >> > >>> >> > > > >> > >>> >> > > > It was commit 3b02006afe8a85477dafa1bd149f1f0dba02afc7 [1] that broke >> > >>> >> > > > my SeaBIOS. It doesn't affect my newer toy the P8Z77-M as much as >> > >>> >> > > > P3B-F, but I still want to blame that, and probably the very next >> > >>> >> > > > commit as well, as they both deal with some very modern aspects of PCI >> > >>> >> > > > that well predates the 440BX. >> > >>> >> > > > >> > >>> >> > > > Is there anything we can do to fix 3b02006afe? >> > >>> >> > > >> > >>> >> > > I commented in the change-set [1] to make the author and reviewers aware >> > >>> >> > > of this issue and referenced your list message, and ask to comment here. >> > >>> >> > > >> > >>> >> > > Could you please provide the debug log of coreboot and SeaBIOS? >> > >>> >> > >> > >>> >> > As Paul mentioned, can you please provide the debug logs for coreboot >> > >>> >> > and SeaBIOS both with ToT coreboot and with HEAD set before the change >> > >>> >> > 3b02006afe where it does not hang? Thanks! >> > >>> >> > >> > >>> >> > > >> > >>> >> > > >> > >>> >> > > > Meanwhile I ported the P3B-F board enable to flashrom [2], which got a >> > >>> >> > > > heavy workout during this bisect, through vendor firmware and both >> > >>> >> > > > good and bad builds of coreboot. In all cases I can flash internal, no >> > >>> >> > > > longer having to haul out my P2B-LS just to use it as a flasher. >> > >>> >> > > > >> > >>> >> > > > Enjoy this long overdue board enable. If it gets submitted, I'll >> > >>> >> > > > retract the ramstage hack[3] doing the same as redundant. >> > >>> >> > > >> > >>> >> > > Very nice! It’s always amazing, how after so many years, when the vendor >> > >>> >> > > already stopped supporting the device, the community still supports the >> > >>> >> > > device and improves the firmware showing that Free Software is the more >> > >>> >> > > sustainable way. >> > >>> >> > > >> > >>> >> > > >> > >>> >> > > Kind regards, >> > >>> >> > > >> > >>> >> > > Paul >> > >>> >> > > >> > >>> >> > > >> > >>> >> > > > [1] https://review.coreboot.org/c/coreboot/+/39486 >> > >>> >> > > > [2] https://review.coreboot.org/c/flashrom/+/41354 >> > >>> >> > > > [3] https://review.coreboot.org/c/coreboot/+/41224 >> > >>> >> > > _______________________________________________ >> > >>> >> > > coreboot mailing list -- coreboot@coreboot.org >> > >>> >> > > To unsubscribe send an email to coreboot-leave@coreboot.org >> > >>> >> _______________________________________________ >> > >>> >> coreboot mailing list -- coreboot@coreboot.org >> > >>> >> To unsubscribe send an email to coreboot-leave@coreboot.org
coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org
Although it's still the same result even with three changes (either can't boot or no boot devices, randomly) - there is a positive effect that USB FT232H log now 't stop and I'm finally able to share a full log for a boot problem. Please compare these two logs: 1) ok_e6fb1344ed9188e19be4b54bdf1a76680b8c4523.txt - boot log for last "working" commit (before the allocator changes) 2) 3fixes.txt - boot log with 3 changes applied on top of 6b95507ec5b087658178a325bdc68570bc48bb20 (after the allocator changes) Hope this comparison will give enough clues about how to fix it further - and I'll happily test your new changes aimed on fixing this
Best regards, Mike Banon
On Fri, May 15, 2020 at 2:44 AM Furquan Shaikh furquan.m.shaikh@gmail.com wrote:
I have uploaded 2 changes on top of Aaron's change. Can you please give these three changes a try: https://review.coreboot.org/c/coreboot/+/41363 https://review.coreboot.org/c/coreboot/+/41418 https://review.coreboot.org/c/coreboot/+/41419
Thank you!
- Furquan
On Thu, May 14, 2020 at 4:16 PM Aaron Durbin adurbin@google.com wrote:
On Thu, May 14, 2020 at 3:46 PM Aaron Durbin adurbin@google.com wrote:
On Thu, May 14, 2020 at 2:46 PM Mike Banon mikebdp2@gmail.com wrote:
Unfortunately it seems a lot of boards are affected by this. A88XM-E and Lenovo G505S (AMD fam15h) also got broken: they rarely succeed at booting - and, when they do, no boot devices are available (virtual floppies too, for some reason) - except coreinfo/tint secondary payloads which became prone to freezing. I attach the A88XM-E logs I've been able to obtain with USB FT232H:
- ok_e6fb1344ed9188e19be4b54bdf1a76680b8c4523.txt - last coreboot
repo's revision where all the stuff works 2) fail_1_3b02006afe8a85477dafa1bd149f1f0dba02afc7.txt - this commit got the boards broken for the first time 3) fail_2_6b95507ec5b087658178a325bdc68570bc48bb20.txt - this is a log for coreboot's master top
For some reason logs for 2) and 3) always stop after "PCI: 00:12.2 EHCI Debug Port hook triggered".
I hope these commits could be reverted before we figure out what's going on with them. Good thing we've noticed it fast enough.
Thanks, Mike. The amd chipset code (all of it from what I can tell) is fundamentally broken and at odds with all of the resource allocation flow. They worked previously because dynamic resources were being assigned using an algorithm that just assumed there weren't collisions, and that was done w/o all the necessary info required for making the proper decisions regarding dynamic resource allocation.
I landed the other chipsets' fixes, but the amd chipset code is going to take a lot more to fix. Would you be willing to test patches as they are crafted? Given the largeness of the problem as well as the gnarly code that is the amd chipset code it's going to take some time so I think we do need to revert the allocator changes until we can do some house keeping.
I just was brainstorming with Furquan. He did push the revert changes, but we were scheming on a patch that I was hoping affected parties could try in conjunction with https://review.coreboot.org/c/coreboot/+/41363. Basically we'll allocate top down like the previous allocator did hoping for no collisions. Let's try that, and see where we land. Regardless we need to fix this amd chipset code as it's a major liability.
-Aaron
Best regards, Mike Banon
On Thu, May 14, 2020 at 8:47 PM Keith Hui buurin@gmail.com wrote:
Hi guys,
31ab7de51a is CB:41368, cherry picked into my local repo.
Turns out I have to back out all four of Furquan's patches (CB:39486~39489) for my board to boot normally again.
Thoughts?
I'll now get a log with everything in at SPEW.
On Thu, May 14, 2020 at 1:05 PM Aaron Durbin adurbin@google.com wrote:
Keith, is it possible to have the console log level set to SPEW? I'm not seeing the full logs to piece it all together.
Allocating resources... Reading resources... Setting RAM size to 768 MB PNP: 03f0.8 missing read_resources Done reading resources. Resource allocator: DOMAIN: 0000 - Pass 1 (gathering requirements) Resource allocator: DOMAIN: 0000 - Pass 2 (allocating resources) Resource ranges: Base: 1000, Size: d000, Tag: 100 Base: f000, Size: 1000, Tag: 100 Resource ranges: Base: 0, Size: ff800000, Tag: 200 Base: 100000000, Size: f00000000, Tag: 100200 Resource ranges: Base: 10000000, Size: 8000000, Tag: 1200 Resource ranges: Base: 18000000, Size: 1100000, Tag: 200
This is the memory address space: Base: 0, Size: ff800000, Tag: 200 Base: 100000000, Size: f00000000, Tag: 100200
Those are valid ranges to choose dynamic resources from.
PCI: 00:00.0 10 <- [0x0000000000 - 0x000fffffff] size 0x10000000 gran 0x1c prefmem
I see 'Setting RAM size to 768 MB' which means I would expect to see a hole in the ranges representing 768MiB.
that would be bad. I don't know what commit '31ab7de51a' is, but it might not contain the CB:41368. Having SPEW logs would be helpful.
Also, what mainboard Kconfig are you selecting for p3bf? src/mainboard/asus/p2b ?
On Thu, May 14, 2020 at 10:42 AM Keith Hui buurin@gmail.com wrote: > > (Temporarily leaving the list out) > > Hi Aaron, > > Here is a log with everything including CB:41368 included. I'll get > this log out to you first, while I try a build with all problem > commits left out. > > Thanks > Keith > > On Thu, May 14, 2020 at 12:53 AM Aaron Durbin adurbin@google.com wrote: > > > > > > > > On Wed, May 13, 2020 at 10:51 PM Keith Hui buurin@gmail.com wrote: > >> > >> Hi guys, > >> > >> I tested these fixes on my board, and I have to say there's still > >> something wrong. They did address the hang or reset in SeaBIOS I first > >> described, but now either my ATA hard drive failed to boot (it tried > >> to hand off to GRUB on my drive, but didn't get there), or it can't > >> find the option ROM of my video card, meaning no display. > >> > >> Now I want to try the other way, testing a build with all changes > >> related to the problem backed out instead. So besides the one I first > >> identified, what other related patches should I try backing out? > > > > > > Just go to the parent of the identified patch. As for the other symptoms you are seeing, I'd love to see logs with the patches we identified so we can root cause. > > > > Thanks. > > > > -Aaron > > > >> > >> On Wed, May 13, 2020 at 11:54 PM Furquan Shaikh > >> furquan.m.shaikh@gmail.com wrote: > >> > > >> > Similar fix for i440x: https://review.coreboot.org/c/coreboot/+/41368 > >> > > >> > On Wed, May 13, 2020 at 11:29 AM Aaron Durbin adurbin@google.com wrote: > >> > > > >> > > i440x chipset is doing things in the wrong way like sandybridge. I uploaded this fix for sandy: https://review.coreboot.org/c/coreboot/+/41364 We'll need to do the equivalent for i440x. > >> > > > >> > > On Wed, May 13, 2020 at 11:13 AM Aaron Durbin adurbin@google.com wrote: > >> > >> > >> > >> OK. I'll take a look at your logs and see what's going on. The patch link I sent was based off of someone else's mainboard logs. > >> > >> > >> > >> On Wed, May 13, 2020 at 10:59 AM Keith Hui buurin@gmail.com wrote: > >> > >>> > >> > >>> Hi Aaron, > >> > >>> > >> > >>> It didn't help. There still a way out of whack entry in the coreboot > >> > >>> table and e820 entry ending at 000003ffffffffff, which I think have > >> > >>> more to do than the 41363's scope. > >> > >>> > >> > >>> Keith > >> > >>> > >> > >>> On Wed, May 13, 2020 at 12:24 PM Aaron Durbin adurbin@google.com wrote: > >> > >>> > > >> > >>> > I think the following patch will fix things up: https://review.coreboot.org/c/coreboot/+/41363 Please let me know. > >> > >>> > > >> > >>> > On Wed, May 13, 2020 at 8:43 AM Keith Hui buurin@gmail.com wrote: > >> > >>> >> > >> > >>> >> Thanks Furquan. > >> > >>> >> > >> > >>> >> Here are 3 logs. Log 1 is at the commit just before the problem. Log 2 > >> > >>> >> is at the problem commit. Log 3 is at the current master, if that's > >> > >>> >> what you meant by ToT. > >> > >>> >> > >> > >>> >> I'm using SeaBIOS 1.13.0, compiled once using the attached .config > >> > >>> >> before taking these logs. All 3 runs are taken using the same SeaBIOS > >> > >>> >> binary. > >> > >>> >> > >> > >>> >> Then I recompiled SeaBIOS with CONFIG_RELOCATE_INIT off, replaced the > >> > >>> >> payload used in run 3, and took an extra run. In this case the board > >> > >>> >> reset on its own at "Scanning option roms", looping infinitely. > >> > >>> >> > >> > >>> >> Hope this helps > >> > >>> >> Keith > >> > >>> >> > >> > >>> >> On Wed, May 13, 2020 at 7:38 AM Furquan Shaikh > >> > >>> >> furquan.m.shaikh@gmail.com wrote: > >> > >>> >> > > >> > >>> >> > Thanks for the report Keith! > >> > >>> >> > > >> > >>> >> > On Wed, May 13, 2020 at 3:42 AM Paul Menzel pmenzel@molgen.mpg.de wrote: > >> > >>> >> > > > >> > >>> >> > > Dear Keith, > >> > >>> >> > > > >> > >>> >> > > > >> > >>> >> > > Am 13.05.20 um 05:21 schrieb Keith Hui: > >> > >>> >> > > > >> > >>> >> > > > I am still refining the P2B family of boards, now including the > >> > >>> >> > > > infamous P3B-F with an unusual appetite for hacks to make work. > >> > >>> >> > > > > >> > >>> >> > > > That said, I'm now finding that, on P3B-F, SeaBIOS hangs when it tries > >> > >>> >> > > > to relocate itself as part of its usual chores. Having just learned > >> > >>> >> > > > git bisect, I decided to try it out. > >> > >>> >> > > > > >> > >>> >> > > > It was commit 3b02006afe8a85477dafa1bd149f1f0dba02afc7 [1] that broke > >> > >>> >> > > > my SeaBIOS. It doesn't affect my newer toy the P8Z77-M as much as > >> > >>> >> > > > P3B-F, but I still want to blame that, and probably the very next > >> > >>> >> > > > commit as well, as they both deal with some very modern aspects of PCI > >> > >>> >> > > > that well predates the 440BX. > >> > >>> >> > > > > >> > >>> >> > > > Is there anything we can do to fix 3b02006afe? > >> > >>> >> > > > >> > >>> >> > > I commented in the change-set [1] to make the author and reviewers aware > >> > >>> >> > > of this issue and referenced your list message, and ask to comment here. > >> > >>> >> > > > >> > >>> >> > > Could you please provide the debug log of coreboot and SeaBIOS? > >> > >>> >> > > >> > >>> >> > As Paul mentioned, can you please provide the debug logs for coreboot > >> > >>> >> > and SeaBIOS both with ToT coreboot and with HEAD set before the change > >> > >>> >> > 3b02006afe where it does not hang? Thanks! > >> > >>> >> > > >> > >>> >> > > > >> > >>> >> > > > >> > >>> >> > > > Meanwhile I ported the P3B-F board enable to flashrom [2], which got a > >> > >>> >> > > > heavy workout during this bisect, through vendor firmware and both > >> > >>> >> > > > good and bad builds of coreboot. In all cases I can flash internal, no > >> > >>> >> > > > longer having to haul out my P2B-LS just to use it as a flasher. > >> > >>> >> > > > > >> > >>> >> > > > Enjoy this long overdue board enable. If it gets submitted, I'll > >> > >>> >> > > > retract the ramstage hack[3] doing the same as redundant. > >> > >>> >> > > > >> > >>> >> > > Very nice! It’s always amazing, how after so many years, when the vendor > >> > >>> >> > > already stopped supporting the device, the community still supports the > >> > >>> >> > > device and improves the firmware showing that Free Software is the more > >> > >>> >> > > sustainable way. > >> > >>> >> > > > >> > >>> >> > > > >> > >>> >> > > Kind regards, > >> > >>> >> > > > >> > >>> >> > > Paul > >> > >>> >> > > > >> > >>> >> > > > >> > >>> >> > > > [1] https://review.coreboot.org/c/coreboot/+/39486 > >> > >>> >> > > > [2] https://review.coreboot.org/c/flashrom/+/41354 > >> > >>> >> > > > [3] https://review.coreboot.org/c/coreboot/+/41224 > >> > >>> >> > > _______________________________________________ > >> > >>> >> > > coreboot mailing list -- coreboot@coreboot.org > >> > >>> >> > > To unsubscribe send an email to coreboot-leave@coreboot.org > >> > >>> >> _______________________________________________ > >> > >>> >> coreboot mailing list -- coreboot@coreboot.org > >> > >>> >> To unsubscribe send an email to coreboot-leave@coreboot.org
coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org
Looking at your change 41369 - soc/amd/stoneyridge: add resources during read_resources() - I tried to do a similar style change on top of your 3 fixes above, and surprisingly it worked at first try - now I'm able to see the boot devices and floppies. New bootlog is attached. After testing it more (should be able to boot 100% of times) I'm going to submit it to review coreboot org soon, for your review - and also we will need to do a similar change for family14 and family16kb if this one succeeds.
diff --git a/src/northbridge/amd/agesa/family15tn/northbridge.c b/src/northbridge/amd/agesa/family15tn/northbridge.c index 9d41e7a1f1..0194ea82ea 100644 --- a/src/northbridge/amd/agesa/family15tn/northbridge.c +++ b/src/northbridge/amd/agesa/family15tn/northbridge.c @@ -666,6 +666,8 @@ static void domain_set_resources(struct device *dev) u32 reset_memhole = 1; #endif
+ domain_read_resources(dev); + pci_tolm = 0xffffffffUL; for (link = dev->link_list; link; link = link->next) { pci_tolm = find_pci_tolm(link); @@ -749,17 +751,18 @@ static void domain_set_resources(struct device *dev) }
add_uma_resource_below_tolm(dev, 7); - +/* for (link = dev->link_list; link; link = link->next) { if (link->children) { assign_resources(link); } } +*/ }
static struct device_operations pci_domain_ops = { - .read_resources = domain_read_resources, - .set_resources = domain_set_resources, + .read_resources = domain_set_resources, + .set_resources = pci_domain_set_resources, .scan_bus = pci_domain_scan_bus, };
On Fri, May 15, 2020 at 12:10 PM Mike Banon mikebdp2@gmail.com wrote:
Although it's still the same result even with three changes (either can't boot or no boot devices, randomly) - there is a positive effect that USB FT232H log now 't stop and I'm finally able to share a full log for a boot problem. Please compare these two logs:
- ok_e6fb1344ed9188e19be4b54bdf1a76680b8c4523.txt - boot log for last
"working" commit (before the allocator changes) 2) 3fixes.txt - boot log with 3 changes applied on top of 6b95507ec5b087658178a325bdc68570bc48bb20 (after the allocator changes) Hope this comparison will give enough clues about how to fix it further - and I'll happily test your new changes aimed on fixing this
Best regards, Mike Banon
On Fri, May 15, 2020 at 2:44 AM Furquan Shaikh furquan.m.shaikh@gmail.com wrote:
I have uploaded 2 changes on top of Aaron's change. Can you please give these three changes a try: https://review.coreboot.org/c/coreboot/+/41363 https://review.coreboot.org/c/coreboot/+/41418 https://review.coreboot.org/c/coreboot/+/41419
Thank you!
- Furquan
On Thu, May 14, 2020 at 4:16 PM Aaron Durbin adurbin@google.com wrote:
On Thu, May 14, 2020 at 3:46 PM Aaron Durbin adurbin@google.com wrote:
On Thu, May 14, 2020 at 2:46 PM Mike Banon mikebdp2@gmail.com wrote:
Unfortunately it seems a lot of boards are affected by this. A88XM-E and Lenovo G505S (AMD fam15h) also got broken: they rarely succeed at booting - and, when they do, no boot devices are available (virtual floppies too, for some reason) - except coreinfo/tint secondary payloads which became prone to freezing. I attach the A88XM-E logs I've been able to obtain with USB FT232H:
- ok_e6fb1344ed9188e19be4b54bdf1a76680b8c4523.txt - last coreboot
repo's revision where all the stuff works 2) fail_1_3b02006afe8a85477dafa1bd149f1f0dba02afc7.txt - this commit got the boards broken for the first time 3) fail_2_6b95507ec5b087658178a325bdc68570bc48bb20.txt - this is a log for coreboot's master top
For some reason logs for 2) and 3) always stop after "PCI: 00:12.2 EHCI Debug Port hook triggered".
I hope these commits could be reverted before we figure out what's going on with them. Good thing we've noticed it fast enough.
Thanks, Mike. The amd chipset code (all of it from what I can tell) is fundamentally broken and at odds with all of the resource allocation flow. They worked previously because dynamic resources were being assigned using an algorithm that just assumed there weren't collisions, and that was done w/o all the necessary info required for making the proper decisions regarding dynamic resource allocation.
I landed the other chipsets' fixes, but the amd chipset code is going to take a lot more to fix. Would you be willing to test patches as they are crafted? Given the largeness of the problem as well as the gnarly code that is the amd chipset code it's going to take some time so I think we do need to revert the allocator changes until we can do some house keeping.
I just was brainstorming with Furquan. He did push the revert changes, but we were scheming on a patch that I was hoping affected parties could try in conjunction with https://review.coreboot.org/c/coreboot/+/41363. Basically we'll allocate top down like the previous allocator did hoping for no collisions. Let's try that, and see where we land. Regardless we need to fix this amd chipset code as it's a major liability.
-Aaron
Best regards, Mike Banon
On Thu, May 14, 2020 at 8:47 PM Keith Hui buurin@gmail.com wrote:
Hi guys,
31ab7de51a is CB:41368, cherry picked into my local repo.
Turns out I have to back out all four of Furquan's patches (CB:39486~39489) for my board to boot normally again.
Thoughts?
I'll now get a log with everything in at SPEW.
On Thu, May 14, 2020 at 1:05 PM Aaron Durbin adurbin@google.com wrote: > > Keith, is it possible to have the console log level set to SPEW? I'm not seeing the full logs to piece it all together. > > Allocating resources... > Reading resources... > Setting RAM size to 768 MB > PNP: 03f0.8 missing read_resources > Done reading resources. > Resource allocator: DOMAIN: 0000 - Pass 1 (gathering requirements) > Resource allocator: DOMAIN: 0000 - Pass 2 (allocating resources) > Resource ranges: > Base: 1000, Size: d000, Tag: 100 > Base: f000, Size: 1000, Tag: 100 > Resource ranges: > Base: 0, Size: ff800000, Tag: 200 > Base: 100000000, Size: f00000000, Tag: 100200 > Resource ranges: > Base: 10000000, Size: 8000000, Tag: 1200 > Resource ranges: > Base: 18000000, Size: 1100000, Tag: 200 > > This is the memory address space: > Base: 0, Size: ff800000, Tag: 200 > Base: 100000000, Size: f00000000, Tag: 100200 > > Those are valid ranges to choose dynamic resources from. > > PCI: 00:00.0 10 <- [0x0000000000 - 0x000fffffff] size 0x10000000 gran 0x1c prefmem > > I see 'Setting RAM size to 768 MB' which means I would expect to see a hole in the ranges representing 768MiB. > > that would be bad. I don't know what commit '31ab7de51a' is, but it might not contain the CB:41368. Having SPEW logs would be helpful. > > Also, what mainboard Kconfig are you selecting for p3bf? src/mainboard/asus/p2b ? > > > > On Thu, May 14, 2020 at 10:42 AM Keith Hui buurin@gmail.com wrote: >> >> (Temporarily leaving the list out) >> >> Hi Aaron, >> >> Here is a log with everything including CB:41368 included. I'll get >> this log out to you first, while I try a build with all problem >> commits left out. >> >> Thanks >> Keith >> >> On Thu, May 14, 2020 at 12:53 AM Aaron Durbin adurbin@google.com wrote: >> > >> > >> > >> > On Wed, May 13, 2020 at 10:51 PM Keith Hui buurin@gmail.com wrote: >> >> >> >> Hi guys, >> >> >> >> I tested these fixes on my board, and I have to say there's still >> >> something wrong. They did address the hang or reset in SeaBIOS I first >> >> described, but now either my ATA hard drive failed to boot (it tried >> >> to hand off to GRUB on my drive, but didn't get there), or it can't >> >> find the option ROM of my video card, meaning no display. >> >> >> >> Now I want to try the other way, testing a build with all changes >> >> related to the problem backed out instead. So besides the one I first >> >> identified, what other related patches should I try backing out? >> > >> > >> > Just go to the parent of the identified patch. As for the other symptoms you are seeing, I'd love to see logs with the patches we identified so we can root cause. >> > >> > Thanks. >> > >> > -Aaron >> > >> >> >> >> On Wed, May 13, 2020 at 11:54 PM Furquan Shaikh >> >> furquan.m.shaikh@gmail.com wrote: >> >> > >> >> > Similar fix for i440x: https://review.coreboot.org/c/coreboot/+/41368 >> >> > >> >> > On Wed, May 13, 2020 at 11:29 AM Aaron Durbin adurbin@google.com wrote: >> >> > > >> >> > > i440x chipset is doing things in the wrong way like sandybridge. I uploaded this fix for sandy: https://review.coreboot.org/c/coreboot/+/41364 We'll need to do the equivalent for i440x. >> >> > > >> >> > > On Wed, May 13, 2020 at 11:13 AM Aaron Durbin adurbin@google.com wrote: >> >> > >> >> >> > >> OK. I'll take a look at your logs and see what's going on. The patch link I sent was based off of someone else's mainboard logs. >> >> > >> >> >> > >> On Wed, May 13, 2020 at 10:59 AM Keith Hui buurin@gmail.com wrote: >> >> > >>> >> >> > >>> Hi Aaron, >> >> > >>> >> >> > >>> It didn't help. There still a way out of whack entry in the coreboot >> >> > >>> table and e820 entry ending at 000003ffffffffff, which I think have >> >> > >>> more to do than the 41363's scope. >> >> > >>> >> >> > >>> Keith >> >> > >>> >> >> > >>> On Wed, May 13, 2020 at 12:24 PM Aaron Durbin adurbin@google.com wrote: >> >> > >>> > >> >> > >>> > I think the following patch will fix things up: https://review.coreboot.org/c/coreboot/+/41363 Please let me know. >> >> > >>> > >> >> > >>> > On Wed, May 13, 2020 at 8:43 AM Keith Hui buurin@gmail.com wrote: >> >> > >>> >> >> >> > >>> >> Thanks Furquan. >> >> > >>> >> >> >> > >>> >> Here are 3 logs. Log 1 is at the commit just before the problem. Log 2 >> >> > >>> >> is at the problem commit. Log 3 is at the current master, if that's >> >> > >>> >> what you meant by ToT. >> >> > >>> >> >> >> > >>> >> I'm using SeaBIOS 1.13.0, compiled once using the attached .config >> >> > >>> >> before taking these logs. All 3 runs are taken using the same SeaBIOS >> >> > >>> >> binary. >> >> > >>> >> >> >> > >>> >> Then I recompiled SeaBIOS with CONFIG_RELOCATE_INIT off, replaced the >> >> > >>> >> payload used in run 3, and took an extra run. In this case the board >> >> > >>> >> reset on its own at "Scanning option roms", looping infinitely. >> >> > >>> >> >> >> > >>> >> Hope this helps >> >> > >>> >> Keith >> >> > >>> >> >> >> > >>> >> On Wed, May 13, 2020 at 7:38 AM Furquan Shaikh >> >> > >>> >> furquan.m.shaikh@gmail.com wrote: >> >> > >>> >> > >> >> > >>> >> > Thanks for the report Keith! >> >> > >>> >> > >> >> > >>> >> > On Wed, May 13, 2020 at 3:42 AM Paul Menzel pmenzel@molgen.mpg.de wrote: >> >> > >>> >> > > >> >> > >>> >> > > Dear Keith, >> >> > >>> >> > > >> >> > >>> >> > > >> >> > >>> >> > > Am 13.05.20 um 05:21 schrieb Keith Hui: >> >> > >>> >> > > >> >> > >>> >> > > > I am still refining the P2B family of boards, now including the >> >> > >>> >> > > > infamous P3B-F with an unusual appetite for hacks to make work. >> >> > >>> >> > > > >> >> > >>> >> > > > That said, I'm now finding that, on P3B-F, SeaBIOS hangs when it tries >> >> > >>> >> > > > to relocate itself as part of its usual chores. Having just learned >> >> > >>> >> > > > git bisect, I decided to try it out. >> >> > >>> >> > > > >> >> > >>> >> > > > It was commit 3b02006afe8a85477dafa1bd149f1f0dba02afc7 [1] that broke >> >> > >>> >> > > > my SeaBIOS. It doesn't affect my newer toy the P8Z77-M as much as >> >> > >>> >> > > > P3B-F, but I still want to blame that, and probably the very next >> >> > >>> >> > > > commit as well, as they both deal with some very modern aspects of PCI >> >> > >>> >> > > > that well predates the 440BX. >> >> > >>> >> > > > >> >> > >>> >> > > > Is there anything we can do to fix 3b02006afe? >> >> > >>> >> > > >> >> > >>> >> > > I commented in the change-set [1] to make the author and reviewers aware >> >> > >>> >> > > of this issue and referenced your list message, and ask to comment here. >> >> > >>> >> > > >> >> > >>> >> > > Could you please provide the debug log of coreboot and SeaBIOS? >> >> > >>> >> > >> >> > >>> >> > As Paul mentioned, can you please provide the debug logs for coreboot >> >> > >>> >> > and SeaBIOS both with ToT coreboot and with HEAD set before the change >> >> > >>> >> > 3b02006afe where it does not hang? Thanks! >> >> > >>> >> > >> >> > >>> >> > > >> >> > >>> >> > > >> >> > >>> >> > > > Meanwhile I ported the P3B-F board enable to flashrom [2], which got a >> >> > >>> >> > > > heavy workout during this bisect, through vendor firmware and both >> >> > >>> >> > > > good and bad builds of coreboot. In all cases I can flash internal, no >> >> > >>> >> > > > longer having to haul out my P2B-LS just to use it as a flasher. >> >> > >>> >> > > > >> >> > >>> >> > > > Enjoy this long overdue board enable. If it gets submitted, I'll >> >> > >>> >> > > > retract the ramstage hack[3] doing the same as redundant. >> >> > >>> >> > > >> >> > >>> >> > > Very nice! It’s always amazing, how after so many years, when the vendor >> >> > >>> >> > > already stopped supporting the device, the community still supports the >> >> > >>> >> > > device and improves the firmware showing that Free Software is the more >> >> > >>> >> > > sustainable way. >> >> > >>> >> > > >> >> > >>> >> > > >> >> > >>> >> > > Kind regards, >> >> > >>> >> > > >> >> > >>> >> > > Paul >> >> > >>> >> > > >> >> > >>> >> > > >> >> > >>> >> > > > [1] https://review.coreboot.org/c/coreboot/+/39486 >> >> > >>> >> > > > [2] https://review.coreboot.org/c/flashrom/+/41354 >> >> > >>> >> > > > [3] https://review.coreboot.org/c/coreboot/+/41224 >> >> > >>> >> > > _______________________________________________ >> >> > >>> >> > > coreboot mailing list -- coreboot@coreboot.org >> >> > >>> >> > > To unsubscribe send an email to coreboot-leave@coreboot.org >> >> > >>> >> _______________________________________________ >> >> > >>> >> coreboot mailing list -- coreboot@coreboot.org >> >> > >>> >> To unsubscribe send an email to coreboot-leave@coreboot.org _______________________________________________ coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org
UPDATE: fam15tn A88XM-E is booting well if all 3 fixes applied together with this patch above (separately not enough to get it working) - tested it enough. However, when I tried a similar fix for fam16kb AM1I-A - it got stuck in a boot loop (see the attached log). Maybe I'm doing something wrong and it worked for fam15tn just by a coincidence. Please take a look at change for a further review https://review.coreboot.org/c/coreboot/+/41431
On Fri, May 15, 2020 at 1:30 PM Mike Banon mikebdp2@gmail.com wrote:
Looking at your change 41369 - soc/amd/stoneyridge: add resources during read_resources() - I tried to do a similar style change on top of your 3 fixes above, and surprisingly it worked at first try - now I'm able to see the boot devices and floppies. New bootlog is attached. After testing it more (should be able to boot 100% of times) I'm going to submit it to review coreboot org soon, for your review - and also we will need to do a similar change for family14 and family16kb if this one succeeds.
diff --git a/src/northbridge/amd/agesa/family15tn/northbridge.c b/src/northbridge/amd/agesa/family15tn/northbridge.c index 9d41e7a1f1..0194ea82ea 100644 --- a/src/northbridge/amd/agesa/family15tn/northbridge.c +++ b/src/northbridge/amd/agesa/family15tn/northbridge.c @@ -666,6 +666,8 @@ static void domain_set_resources(struct device *dev) u32 reset_memhole = 1; #endif
domain_read_resources(dev);
pci_tolm = 0xffffffffUL; for (link = dev->link_list; link; link = link->next) { pci_tolm = find_pci_tolm(link);
@@ -749,17 +751,18 @@ static void domain_set_resources(struct device *dev) }
add_uma_resource_below_tolm(dev, 7);
+/* for (link = dev->link_list; link; link = link->next) { if (link->children) { assign_resources(link); } } +*/ }
static struct device_operations pci_domain_ops = {
.read_resources = domain_read_resources,
.set_resources = domain_set_resources,
.read_resources = domain_set_resources,
.set_resources = pci_domain_set_resources, .scan_bus = pci_domain_scan_bus,
};
On Fri, May 15, 2020 at 12:10 PM Mike Banon mikebdp2@gmail.com wrote:
Although it's still the same result even with three changes (either can't boot or no boot devices, randomly) - there is a positive effect that USB FT232H log now 't stop and I'm finally able to share a full log for a boot problem. Please compare these two logs:
- ok_e6fb1344ed9188e19be4b54bdf1a76680b8c4523.txt - boot log for last
"working" commit (before the allocator changes) 2) 3fixes.txt - boot log with 3 changes applied on top of 6b95507ec5b087658178a325bdc68570bc48bb20 (after the allocator changes) Hope this comparison will give enough clues about how to fix it further - and I'll happily test your new changes aimed on fixing this
Best regards, Mike Banon
On Fri, May 15, 2020 at 2:44 AM Furquan Shaikh furquan.m.shaikh@gmail.com wrote:
I have uploaded 2 changes on top of Aaron's change. Can you please give these three changes a try: https://review.coreboot.org/c/coreboot/+/41363 https://review.coreboot.org/c/coreboot/+/41418 https://review.coreboot.org/c/coreboot/+/41419
Thank you!
- Furquan
On Thu, May 14, 2020 at 4:16 PM Aaron Durbin adurbin@google.com wrote:
On Thu, May 14, 2020 at 3:46 PM Aaron Durbin adurbin@google.com wrote:
On Thu, May 14, 2020 at 2:46 PM Mike Banon mikebdp2@gmail.com wrote:
Unfortunately it seems a lot of boards are affected by this. A88XM-E and Lenovo G505S (AMD fam15h) also got broken: they rarely succeed at booting - and, when they do, no boot devices are available (virtual floppies too, for some reason) - except coreinfo/tint secondary payloads which became prone to freezing. I attach the A88XM-E logs I've been able to obtain with USB FT232H:
- ok_e6fb1344ed9188e19be4b54bdf1a76680b8c4523.txt - last coreboot
repo's revision where all the stuff works 2) fail_1_3b02006afe8a85477dafa1bd149f1f0dba02afc7.txt - this commit got the boards broken for the first time 3) fail_2_6b95507ec5b087658178a325bdc68570bc48bb20.txt - this is a log for coreboot's master top
For some reason logs for 2) and 3) always stop after "PCI: 00:12.2 EHCI Debug Port hook triggered".
I hope these commits could be reverted before we figure out what's going on with them. Good thing we've noticed it fast enough.
Thanks, Mike. The amd chipset code (all of it from what I can tell) is fundamentally broken and at odds with all of the resource allocation flow. They worked previously because dynamic resources were being assigned using an algorithm that just assumed there weren't collisions, and that was done w/o all the necessary info required for making the proper decisions regarding dynamic resource allocation.
I landed the other chipsets' fixes, but the amd chipset code is going to take a lot more to fix. Would you be willing to test patches as they are crafted? Given the largeness of the problem as well as the gnarly code that is the amd chipset code it's going to take some time so I think we do need to revert the allocator changes until we can do some house keeping.
I just was brainstorming with Furquan. He did push the revert changes, but we were scheming on a patch that I was hoping affected parties could try in conjunction with https://review.coreboot.org/c/coreboot/+/41363. Basically we'll allocate top down like the previous allocator did hoping for no collisions. Let's try that, and see where we land. Regardless we need to fix this amd chipset code as it's a major liability.
-Aaron
Best regards, Mike Banon
On Thu, May 14, 2020 at 8:47 PM Keith Hui buurin@gmail.com wrote: > > Hi guys, > > 31ab7de51a is CB:41368, cherry picked into my local repo. > > Turns out I have to back out all four of Furquan's patches > (CB:39486~39489) for my board to boot normally again. > > Thoughts? > > I'll now get a log with everything in at SPEW. > > > On Thu, May 14, 2020 at 1:05 PM Aaron Durbin adurbin@google.com wrote: > > > > Keith, is it possible to have the console log level set to SPEW? I'm not seeing the full logs to piece it all together. > > > > Allocating resources... > > Reading resources... > > Setting RAM size to 768 MB > > PNP: 03f0.8 missing read_resources > > Done reading resources. > > Resource allocator: DOMAIN: 0000 - Pass 1 (gathering requirements) > > Resource allocator: DOMAIN: 0000 - Pass 2 (allocating resources) > > Resource ranges: > > Base: 1000, Size: d000, Tag: 100 > > Base: f000, Size: 1000, Tag: 100 > > Resource ranges: > > Base: 0, Size: ff800000, Tag: 200 > > Base: 100000000, Size: f00000000, Tag: 100200 > > Resource ranges: > > Base: 10000000, Size: 8000000, Tag: 1200 > > Resource ranges: > > Base: 18000000, Size: 1100000, Tag: 200 > > > > This is the memory address space: > > Base: 0, Size: ff800000, Tag: 200 > > Base: 100000000, Size: f00000000, Tag: 100200 > > > > Those are valid ranges to choose dynamic resources from. > > > > PCI: 00:00.0 10 <- [0x0000000000 - 0x000fffffff] size 0x10000000 gran 0x1c prefmem > > > > I see 'Setting RAM size to 768 MB' which means I would expect to see a hole in the ranges representing 768MiB. > > > > that would be bad. I don't know what commit '31ab7de51a' is, but it might not contain the CB:41368. Having SPEW logs would be helpful. > > > > Also, what mainboard Kconfig are you selecting for p3bf? src/mainboard/asus/p2b ? > > > > > > > > On Thu, May 14, 2020 at 10:42 AM Keith Hui buurin@gmail.com wrote: > >> > >> (Temporarily leaving the list out) > >> > >> Hi Aaron, > >> > >> Here is a log with everything including CB:41368 included. I'll get > >> this log out to you first, while I try a build with all problem > >> commits left out. > >> > >> Thanks > >> Keith > >> > >> On Thu, May 14, 2020 at 12:53 AM Aaron Durbin adurbin@google.com wrote: > >> > > >> > > >> > > >> > On Wed, May 13, 2020 at 10:51 PM Keith Hui buurin@gmail.com wrote: > >> >> > >> >> Hi guys, > >> >> > >> >> I tested these fixes on my board, and I have to say there's still > >> >> something wrong. They did address the hang or reset in SeaBIOS I first > >> >> described, but now either my ATA hard drive failed to boot (it tried > >> >> to hand off to GRUB on my drive, but didn't get there), or it can't > >> >> find the option ROM of my video card, meaning no display. > >> >> > >> >> Now I want to try the other way, testing a build with all changes > >> >> related to the problem backed out instead. So besides the one I first > >> >> identified, what other related patches should I try backing out? > >> > > >> > > >> > Just go to the parent of the identified patch. As for the other symptoms you are seeing, I'd love to see logs with the patches we identified so we can root cause. > >> > > >> > Thanks. > >> > > >> > -Aaron > >> > > >> >> > >> >> On Wed, May 13, 2020 at 11:54 PM Furquan Shaikh > >> >> furquan.m.shaikh@gmail.com wrote: > >> >> > > >> >> > Similar fix for i440x: https://review.coreboot.org/c/coreboot/+/41368 > >> >> > > >> >> > On Wed, May 13, 2020 at 11:29 AM Aaron Durbin adurbin@google.com wrote: > >> >> > > > >> >> > > i440x chipset is doing things in the wrong way like sandybridge. I uploaded this fix for sandy: https://review.coreboot.org/c/coreboot/+/41364 We'll need to do the equivalent for i440x. > >> >> > > > >> >> > > On Wed, May 13, 2020 at 11:13 AM Aaron Durbin adurbin@google.com wrote: > >> >> > >> > >> >> > >> OK. I'll take a look at your logs and see what's going on. The patch link I sent was based off of someone else's mainboard logs. > >> >> > >> > >> >> > >> On Wed, May 13, 2020 at 10:59 AM Keith Hui buurin@gmail.com wrote: > >> >> > >>> > >> >> > >>> Hi Aaron, > >> >> > >>> > >> >> > >>> It didn't help. There still a way out of whack entry in the coreboot > >> >> > >>> table and e820 entry ending at 000003ffffffffff, which I think have > >> >> > >>> more to do than the 41363's scope. > >> >> > >>> > >> >> > >>> Keith > >> >> > >>> > >> >> > >>> On Wed, May 13, 2020 at 12:24 PM Aaron Durbin adurbin@google.com wrote: > >> >> > >>> > > >> >> > >>> > I think the following patch will fix things up: https://review.coreboot.org/c/coreboot/+/41363 Please let me know. > >> >> > >>> > > >> >> > >>> > On Wed, May 13, 2020 at 8:43 AM Keith Hui buurin@gmail.com wrote: > >> >> > >>> >> > >> >> > >>> >> Thanks Furquan. > >> >> > >>> >> > >> >> > >>> >> Here are 3 logs. Log 1 is at the commit just before the problem. Log 2 > >> >> > >>> >> is at the problem commit. Log 3 is at the current master, if that's > >> >> > >>> >> what you meant by ToT. > >> >> > >>> >> > >> >> > >>> >> I'm using SeaBIOS 1.13.0, compiled once using the attached .config > >> >> > >>> >> before taking these logs. All 3 runs are taken using the same SeaBIOS > >> >> > >>> >> binary. > >> >> > >>> >> > >> >> > >>> >> Then I recompiled SeaBIOS with CONFIG_RELOCATE_INIT off, replaced the > >> >> > >>> >> payload used in run 3, and took an extra run. In this case the board > >> >> > >>> >> reset on its own at "Scanning option roms", looping infinitely. > >> >> > >>> >> > >> >> > >>> >> Hope this helps > >> >> > >>> >> Keith > >> >> > >>> >> > >> >> > >>> >> On Wed, May 13, 2020 at 7:38 AM Furquan Shaikh > >> >> > >>> >> furquan.m.shaikh@gmail.com wrote: > >> >> > >>> >> > > >> >> > >>> >> > Thanks for the report Keith! > >> >> > >>> >> > > >> >> > >>> >> > On Wed, May 13, 2020 at 3:42 AM Paul Menzel pmenzel@molgen.mpg.de wrote: > >> >> > >>> >> > > > >> >> > >>> >> > > Dear Keith, > >> >> > >>> >> > > > >> >> > >>> >> > > > >> >> > >>> >> > > Am 13.05.20 um 05:21 schrieb Keith Hui: > >> >> > >>> >> > > > >> >> > >>> >> > > > I am still refining the P2B family of boards, now including the > >> >> > >>> >> > > > infamous P3B-F with an unusual appetite for hacks to make work. > >> >> > >>> >> > > > > >> >> > >>> >> > > > That said, I'm now finding that, on P3B-F, SeaBIOS hangs when it tries > >> >> > >>> >> > > > to relocate itself as part of its usual chores. Having just learned > >> >> > >>> >> > > > git bisect, I decided to try it out. > >> >> > >>> >> > > > > >> >> > >>> >> > > > It was commit 3b02006afe8a85477dafa1bd149f1f0dba02afc7 [1] that broke > >> >> > >>> >> > > > my SeaBIOS. It doesn't affect my newer toy the P8Z77-M as much as > >> >> > >>> >> > > > P3B-F, but I still want to blame that, and probably the very next > >> >> > >>> >> > > > commit as well, as they both deal with some very modern aspects of PCI > >> >> > >>> >> > > > that well predates the 440BX. > >> >> > >>> >> > > > > >> >> > >>> >> > > > Is there anything we can do to fix 3b02006afe? > >> >> > >>> >> > > > >> >> > >>> >> > > I commented in the change-set [1] to make the author and reviewers aware > >> >> > >>> >> > > of this issue and referenced your list message, and ask to comment here. > >> >> > >>> >> > > > >> >> > >>> >> > > Could you please provide the debug log of coreboot and SeaBIOS? > >> >> > >>> >> > > >> >> > >>> >> > As Paul mentioned, can you please provide the debug logs for coreboot > >> >> > >>> >> > and SeaBIOS both with ToT coreboot and with HEAD set before the change > >> >> > >>> >> > 3b02006afe where it does not hang? Thanks! > >> >> > >>> >> > > >> >> > >>> >> > > > >> >> > >>> >> > > > >> >> > >>> >> > > > Meanwhile I ported the P3B-F board enable to flashrom [2], which got a > >> >> > >>> >> > > > heavy workout during this bisect, through vendor firmware and both > >> >> > >>> >> > > > good and bad builds of coreboot. In all cases I can flash internal, no > >> >> > >>> >> > > > longer having to haul out my P2B-LS just to use it as a flasher. > >> >> > >>> >> > > > > >> >> > >>> >> > > > Enjoy this long overdue board enable. If it gets submitted, I'll > >> >> > >>> >> > > > retract the ramstage hack[3] doing the same as redundant. > >> >> > >>> >> > > > >> >> > >>> >> > > Very nice! It’s always amazing, how after so many years, when the vendor > >> >> > >>> >> > > already stopped supporting the device, the community still supports the > >> >> > >>> >> > > device and improves the firmware showing that Free Software is the more > >> >> > >>> >> > > sustainable way. > >> >> > >>> >> > > > >> >> > >>> >> > > > >> >> > >>> >> > > Kind regards, > >> >> > >>> >> > > > >> >> > >>> >> > > Paul > >> >> > >>> >> > > > >> >> > >>> >> > > > >> >> > >>> >> > > > [1] https://review.coreboot.org/c/coreboot/+/39486 > >> >> > >>> >> > > > [2] https://review.coreboot.org/c/flashrom/+/41354 > >> >> > >>> >> > > > [3] https://review.coreboot.org/c/coreboot/+/41224 > >> >> > >>> >> > > _______________________________________________ > >> >> > >>> >> > > coreboot mailing list -- coreboot@coreboot.org > >> >> > >>> >> > > To unsubscribe send an email to coreboot-leave@coreboot.org > >> >> > >>> >> _______________________________________________ > >> >> > >>> >> coreboot mailing list -- coreboot@coreboot.org > >> >> > >>> >> To unsubscribe send an email to coreboot-leave@coreboot.org > _______________________________________________ > coreboot mailing list -- coreboot@coreboot.org > To unsubscribe send an email to coreboot-leave@coreboot.org
Hello MIke,
IIUC, there are more things that will have to be looked at closely and reworked for the impacted AMD chipsets. As per the discussion on the IRC channel, I have prepared a patch series which does the following:
- Revert the new resource allocator changes: https://review.coreboot.org/c/coreboot/+/41411 https://review.coreboot.org/c/coreboot/+/41412 https://review.coreboot.org/c/coreboot/+/41413
- Split old resource allocator into its separate unit: https://review.coreboot.org/c/coreboot/+/41442
- Reland new allocator guarded by a Kconfig: https://review.coreboot.org/c/coreboot/+/41443
- Select old allocator for impacted chipsets and enable new allocator for all other boards: https://review.coreboot.org/c/coreboot/+/41444 https://review.coreboot.org/c/coreboot/+/41445
This should give us some more time to fix the impacted chipsets and keep the boards working in upstream.
On Fri, May 15, 2020 at 11:11 AM Mike Banon mikebdp2@gmail.com wrote:
UPDATE: fam15tn A88XM-E is booting well if all 3 fixes applied together with this patch above (separately not enough to get it working) - tested it enough. However, when I tried a similar fix for fam16kb AM1I-A - it got stuck in a boot loop (see the attached log). Maybe I'm doing something wrong and it worked for fam15tn just by a coincidence. Please take a look at change for a further review https://review.coreboot.org/c/coreboot/+/41431
On Fri, May 15, 2020 at 1:30 PM Mike Banon mikebdp2@gmail.com wrote:
Looking at your change 41369 - soc/amd/stoneyridge: add resources during read_resources() - I tried to do a similar style change on top of your 3 fixes above, and surprisingly it worked at first try - now I'm able to see the boot devices and floppies. New bootlog is attached. After testing it more (should be able to boot 100% of times) I'm going to submit it to review coreboot org soon, for your review - and also we will need to do a similar change for family14 and family16kb if this one succeeds.
diff --git a/src/northbridge/amd/agesa/family15tn/northbridge.c b/src/northbridge/amd/agesa/family15tn/northbridge.c index 9d41e7a1f1..0194ea82ea 100644 --- a/src/northbridge/amd/agesa/family15tn/northbridge.c +++ b/src/northbridge/amd/agesa/family15tn/northbridge.c @@ -666,6 +666,8 @@ static void domain_set_resources(struct device *dev) u32 reset_memhole = 1; #endif
domain_read_resources(dev);
pci_tolm = 0xffffffffUL; for (link = dev->link_list; link; link = link->next) { pci_tolm = find_pci_tolm(link);
@@ -749,17 +751,18 @@ static void domain_set_resources(struct device *dev) }
add_uma_resource_below_tolm(dev, 7);
+/* for (link = dev->link_list; link; link = link->next) { if (link->children) { assign_resources(link); } } +*/ }
static struct device_operations pci_domain_ops = {
.read_resources = domain_read_resources,
.set_resources = domain_set_resources,
.read_resources = domain_set_resources,
.set_resources = pci_domain_set_resources, .scan_bus = pci_domain_scan_bus,
};
On Fri, May 15, 2020 at 12:10 PM Mike Banon mikebdp2@gmail.com wrote:
Although it's still the same result even with three changes (either can't boot or no boot devices, randomly) - there is a positive effect that USB FT232H log now 't stop and I'm finally able to share a full log for a boot problem. Please compare these two logs:
- ok_e6fb1344ed9188e19be4b54bdf1a76680b8c4523.txt - boot log for last
"working" commit (before the allocator changes) 2) 3fixes.txt - boot log with 3 changes applied on top of 6b95507ec5b087658178a325bdc68570bc48bb20 (after the allocator changes) Hope this comparison will give enough clues about how to fix it further - and I'll happily test your new changes aimed on fixing this
Best regards, Mike Banon
On Fri, May 15, 2020 at 2:44 AM Furquan Shaikh furquan.m.shaikh@gmail.com wrote:
I have uploaded 2 changes on top of Aaron's change. Can you please give these three changes a try: https://review.coreboot.org/c/coreboot/+/41363 https://review.coreboot.org/c/coreboot/+/41418 https://review.coreboot.org/c/coreboot/+/41419
Thank you!
- Furquan
On Thu, May 14, 2020 at 4:16 PM Aaron Durbin adurbin@google.com wrote:
On Thu, May 14, 2020 at 3:46 PM Aaron Durbin adurbin@google.com wrote:
On Thu, May 14, 2020 at 2:46 PM Mike Banon mikebdp2@gmail.com wrote: > > Unfortunately it seems a lot of boards are affected by this. A88XM-E > and Lenovo G505S (AMD fam15h) also got broken: they rarely succeed at > booting - and, when they do, no boot devices are available (virtual > floppies too, for some reason) - except coreinfo/tint secondary > payloads which became prone to freezing. I attach the A88XM-E logs > I've been able to obtain with USB FT232H: > > 1) ok_e6fb1344ed9188e19be4b54bdf1a76680b8c4523.txt - last coreboot > repo's revision where all the stuff works > 2) fail_1_3b02006afe8a85477dafa1bd149f1f0dba02afc7.txt - this commit > got the boards broken for the first time > 3) fail_2_6b95507ec5b087658178a325bdc68570bc48bb20.txt - this is a log > for coreboot's master top > > For some reason logs for 2) and 3) always stop after "PCI: 00:12.2 > EHCI Debug Port hook triggered". > > I hope these commits could be reverted before we figure out what's > going on with them. Good thing we've noticed it fast enough. >
Thanks, Mike. The amd chipset code (all of it from what I can tell) is fundamentally broken and at odds with all of the resource allocation flow. They worked previously because dynamic resources were being assigned using an algorithm that just assumed there weren't collisions, and that was done w/o all the necessary info required for making the proper decisions regarding dynamic resource allocation.
I landed the other chipsets' fixes, but the amd chipset code is going to take a lot more to fix. Would you be willing to test patches as they are crafted? Given the largeness of the problem as well as the gnarly code that is the amd chipset code it's going to take some time so I think we do need to revert the allocator changes until we can do some house keeping.
I just was brainstorming with Furquan. He did push the revert changes, but we were scheming on a patch that I was hoping affected parties could try in conjunction with https://review.coreboot.org/c/coreboot/+/41363. Basically we'll allocate top down like the previous allocator did hoping for no collisions. Let's try that, and see where we land. Regardless we need to fix this amd chipset code as it's a major liability.
-Aaron > > Best regards, > Mike Banon > > On Thu, May 14, 2020 at 8:47 PM Keith Hui buurin@gmail.com wrote: > > > > Hi guys, > > > > 31ab7de51a is CB:41368, cherry picked into my local repo. > > > > Turns out I have to back out all four of Furquan's patches > > (CB:39486~39489) for my board to boot normally again. > > > > Thoughts? > > > > I'll now get a log with everything in at SPEW. > > > > > > On Thu, May 14, 2020 at 1:05 PM Aaron Durbin adurbin@google.com wrote: > > > > > > Keith, is it possible to have the console log level set to SPEW? I'm not seeing the full logs to piece it all together. > > > > > > Allocating resources... > > > Reading resources... > > > Setting RAM size to 768 MB > > > PNP: 03f0.8 missing read_resources > > > Done reading resources. > > > Resource allocator: DOMAIN: 0000 - Pass 1 (gathering requirements) > > > Resource allocator: DOMAIN: 0000 - Pass 2 (allocating resources) > > > Resource ranges: > > > Base: 1000, Size: d000, Tag: 100 > > > Base: f000, Size: 1000, Tag: 100 > > > Resource ranges: > > > Base: 0, Size: ff800000, Tag: 200 > > > Base: 100000000, Size: f00000000, Tag: 100200 > > > Resource ranges: > > > Base: 10000000, Size: 8000000, Tag: 1200 > > > Resource ranges: > > > Base: 18000000, Size: 1100000, Tag: 200 > > > > > > This is the memory address space: > > > Base: 0, Size: ff800000, Tag: 200 > > > Base: 100000000, Size: f00000000, Tag: 100200 > > > > > > Those are valid ranges to choose dynamic resources from. > > > > > > PCI: 00:00.0 10 <- [0x0000000000 - 0x000fffffff] size 0x10000000 gran 0x1c prefmem > > > > > > I see 'Setting RAM size to 768 MB' which means I would expect to see a hole in the ranges representing 768MiB. > > > > > > that would be bad. I don't know what commit '31ab7de51a' is, but it might not contain the CB:41368. Having SPEW logs would be helpful. > > > > > > Also, what mainboard Kconfig are you selecting for p3bf? src/mainboard/asus/p2b ? > > > > > > > > > > > > On Thu, May 14, 2020 at 10:42 AM Keith Hui buurin@gmail.com wrote: > > >> > > >> (Temporarily leaving the list out) > > >> > > >> Hi Aaron, > > >> > > >> Here is a log with everything including CB:41368 included. I'll get > > >> this log out to you first, while I try a build with all problem > > >> commits left out. > > >> > > >> Thanks > > >> Keith > > >> > > >> On Thu, May 14, 2020 at 12:53 AM Aaron Durbin adurbin@google.com wrote: > > >> > > > >> > > > >> > > > >> > On Wed, May 13, 2020 at 10:51 PM Keith Hui buurin@gmail.com wrote: > > >> >> > > >> >> Hi guys, > > >> >> > > >> >> I tested these fixes on my board, and I have to say there's still > > >> >> something wrong. They did address the hang or reset in SeaBIOS I first > > >> >> described, but now either my ATA hard drive failed to boot (it tried > > >> >> to hand off to GRUB on my drive, but didn't get there), or it can't > > >> >> find the option ROM of my video card, meaning no display. > > >> >> > > >> >> Now I want to try the other way, testing a build with all changes > > >> >> related to the problem backed out instead. So besides the one I first > > >> >> identified, what other related patches should I try backing out? > > >> > > > >> > > > >> > Just go to the parent of the identified patch. As for the other symptoms you are seeing, I'd love to see logs with the patches we identified so we can root cause. > > >> > > > >> > Thanks. > > >> > > > >> > -Aaron > > >> > > > >> >> > > >> >> On Wed, May 13, 2020 at 11:54 PM Furquan Shaikh > > >> >> furquan.m.shaikh@gmail.com wrote: > > >> >> > > > >> >> > Similar fix for i440x: https://review.coreboot.org/c/coreboot/+/41368 > > >> >> > > > >> >> > On Wed, May 13, 2020 at 11:29 AM Aaron Durbin adurbin@google.com wrote: > > >> >> > > > > >> >> > > i440x chipset is doing things in the wrong way like sandybridge. I uploaded this fix for sandy: https://review.coreboot.org/c/coreboot/+/41364 We'll need to do the equivalent for i440x. > > >> >> > > > > >> >> > > On Wed, May 13, 2020 at 11:13 AM Aaron Durbin adurbin@google.com wrote: > > >> >> > >> > > >> >> > >> OK. I'll take a look at your logs and see what's going on. The patch link I sent was based off of someone else's mainboard logs. > > >> >> > >> > > >> >> > >> On Wed, May 13, 2020 at 10:59 AM Keith Hui buurin@gmail.com wrote: > > >> >> > >>> > > >> >> > >>> Hi Aaron, > > >> >> > >>> > > >> >> > >>> It didn't help. There still a way out of whack entry in the coreboot > > >> >> > >>> table and e820 entry ending at 000003ffffffffff, which I think have > > >> >> > >>> more to do than the 41363's scope. > > >> >> > >>> > > >> >> > >>> Keith > > >> >> > >>> > > >> >> > >>> On Wed, May 13, 2020 at 12:24 PM Aaron Durbin adurbin@google.com wrote: > > >> >> > >>> > > > >> >> > >>> > I think the following patch will fix things up: https://review.coreboot.org/c/coreboot/+/41363 Please let me know. > > >> >> > >>> > > > >> >> > >>> > On Wed, May 13, 2020 at 8:43 AM Keith Hui buurin@gmail.com wrote: > > >> >> > >>> >> > > >> >> > >>> >> Thanks Furquan. > > >> >> > >>> >> > > >> >> > >>> >> Here are 3 logs. Log 1 is at the commit just before the problem. Log 2 > > >> >> > >>> >> is at the problem commit. Log 3 is at the current master, if that's > > >> >> > >>> >> what you meant by ToT. > > >> >> > >>> >> > > >> >> > >>> >> I'm using SeaBIOS 1.13.0, compiled once using the attached .config > > >> >> > >>> >> before taking these logs. All 3 runs are taken using the same SeaBIOS > > >> >> > >>> >> binary. > > >> >> > >>> >> > > >> >> > >>> >> Then I recompiled SeaBIOS with CONFIG_RELOCATE_INIT off, replaced the > > >> >> > >>> >> payload used in run 3, and took an extra run. In this case the board > > >> >> > >>> >> reset on its own at "Scanning option roms", looping infinitely. > > >> >> > >>> >> > > >> >> > >>> >> Hope this helps > > >> >> > >>> >> Keith > > >> >> > >>> >> > > >> >> > >>> >> On Wed, May 13, 2020 at 7:38 AM Furquan Shaikh > > >> >> > >>> >> furquan.m.shaikh@gmail.com wrote: > > >> >> > >>> >> > > > >> >> > >>> >> > Thanks for the report Keith! > > >> >> > >>> >> > > > >> >> > >>> >> > On Wed, May 13, 2020 at 3:42 AM Paul Menzel pmenzel@molgen.mpg.de wrote: > > >> >> > >>> >> > > > > >> >> > >>> >> > > Dear Keith, > > >> >> > >>> >> > > > > >> >> > >>> >> > > > > >> >> > >>> >> > > Am 13.05.20 um 05:21 schrieb Keith Hui: > > >> >> > >>> >> > > > > >> >> > >>> >> > > > I am still refining the P2B family of boards, now including the > > >> >> > >>> >> > > > infamous P3B-F with an unusual appetite for hacks to make work. > > >> >> > >>> >> > > > > > >> >> > >>> >> > > > That said, I'm now finding that, on P3B-F, SeaBIOS hangs when it tries > > >> >> > >>> >> > > > to relocate itself as part of its usual chores. Having just learned > > >> >> > >>> >> > > > git bisect, I decided to try it out. > > >> >> > >>> >> > > > > > >> >> > >>> >> > > > It was commit 3b02006afe8a85477dafa1bd149f1f0dba02afc7 [1] that broke > > >> >> > >>> >> > > > my SeaBIOS. It doesn't affect my newer toy the P8Z77-M as much as > > >> >> > >>> >> > > > P3B-F, but I still want to blame that, and probably the very next > > >> >> > >>> >> > > > commit as well, as they both deal with some very modern aspects of PCI > > >> >> > >>> >> > > > that well predates the 440BX. > > >> >> > >>> >> > > > > > >> >> > >>> >> > > > Is there anything we can do to fix 3b02006afe? > > >> >> > >>> >> > > > > >> >> > >>> >> > > I commented in the change-set [1] to make the author and reviewers aware > > >> >> > >>> >> > > of this issue and referenced your list message, and ask to comment here. > > >> >> > >>> >> > > > > >> >> > >>> >> > > Could you please provide the debug log of coreboot and SeaBIOS? > > >> >> > >>> >> > > > >> >> > >>> >> > As Paul mentioned, can you please provide the debug logs for coreboot > > >> >> > >>> >> > and SeaBIOS both with ToT coreboot and with HEAD set before the change > > >> >> > >>> >> > 3b02006afe where it does not hang? Thanks! > > >> >> > >>> >> > > > >> >> > >>> >> > > > > >> >> > >>> >> > > > > >> >> > >>> >> > > > Meanwhile I ported the P3B-F board enable to flashrom [2], which got a > > >> >> > >>> >> > > > heavy workout during this bisect, through vendor firmware and both > > >> >> > >>> >> > > > good and bad builds of coreboot. In all cases I can flash internal, no > > >> >> > >>> >> > > > longer having to haul out my P2B-LS just to use it as a flasher. > > >> >> > >>> >> > > > > > >> >> > >>> >> > > > Enjoy this long overdue board enable. If it gets submitted, I'll > > >> >> > >>> >> > > > retract the ramstage hack[3] doing the same as redundant. > > >> >> > >>> >> > > > > >> >> > >>> >> > > Very nice! It’s always amazing, how after so many years, when the vendor > > >> >> > >>> >> > > already stopped supporting the device, the community still supports the > > >> >> > >>> >> > > device and improves the firmware showing that Free Software is the more > > >> >> > >>> >> > > sustainable way. > > >> >> > >>> >> > > > > >> >> > >>> >> > > > > >> >> > >>> >> > > Kind regards, > > >> >> > >>> >> > > > > >> >> > >>> >> > > Paul > > >> >> > >>> >> > > > > >> >> > >>> >> > > > > >> >> > >>> >> > > > [1] https://review.coreboot.org/c/coreboot/+/39486 > > >> >> > >>> >> > > > [2] https://review.coreboot.org/c/flashrom/+/41354 > > >> >> > >>> >> > > > [3] https://review.coreboot.org/c/coreboot/+/41224 > > >> >> > >>> >> > > _______________________________________________ > > >> >> > >>> >> > > coreboot mailing list -- coreboot@coreboot.org > > >> >> > >>> >> > > To unsubscribe send an email to coreboot-leave@coreboot.org > > >> >> > >>> >> _______________________________________________ > > >> >> > >>> >> coreboot mailing list -- coreboot@coreboot.org > > >> >> > >>> >> To unsubscribe send an email to coreboot-leave@coreboot.org > > _______________________________________________ > > coreboot mailing list -- coreboot@coreboot.org > > To unsubscribe send an email to coreboot-leave@coreboot.org
Hi again Furquan, happy to tell you that both f15tn A88XM-E and f16kb AM1I-A - boot fine with this chain of changes applied on top of master. Thank you a lot, I'm going to review it soon. Hopefully later we'll come up with a solution better than my weird https://review.coreboot.org/c/coreboot/+/41431 - I'll happily test any proposed changes aimed on fixing a V4 allocator for these chipsets/boards.
On Sat, May 16, 2020 at 2:41 AM Furquan Shaikh furquan.m.shaikh@gmail.com wrote:
Hello MIke,
IIUC, there are more things that will have to be looked at closely and reworked for the impacted AMD chipsets. As per the discussion on the IRC channel, I have prepared a patch series which does the following:
- Revert the new resource allocator changes:
https://review.coreboot.org/c/coreboot/+/41411 https://review.coreboot.org/c/coreboot/+/41412 https://review.coreboot.org/c/coreboot/+/41413
- Split old resource allocator into its separate unit:
https://review.coreboot.org/c/coreboot/+/41442
- Reland new allocator guarded by a Kconfig:
https://review.coreboot.org/c/coreboot/+/41443
- Select old allocator for impacted chipsets and enable new allocator
for all other boards: https://review.coreboot.org/c/coreboot/+/41444 https://review.coreboot.org/c/coreboot/+/41445
This should give us some more time to fix the impacted chipsets and keep the boards working in upstream.
On Fri, May 15, 2020 at 11:11 AM Mike Banon mikebdp2@gmail.com wrote:
UPDATE: fam15tn A88XM-E is booting well if all 3 fixes applied together with this patch above (separately not enough to get it working) - tested it enough. However, when I tried a similar fix for fam16kb AM1I-A - it got stuck in a boot loop (see the attached log). Maybe I'm doing something wrong and it worked for fam15tn just by a coincidence. Please take a look at change for a further review https://review.coreboot.org/c/coreboot/+/41431
On Fri, May 15, 2020 at 1:30 PM Mike Banon mikebdp2@gmail.com wrote:
Looking at your change 41369 - soc/amd/stoneyridge: add resources during read_resources() - I tried to do a similar style change on top of your 3 fixes above, and surprisingly it worked at first try - now I'm able to see the boot devices and floppies. New bootlog is attached. After testing it more (should be able to boot 100% of times) I'm going to submit it to review coreboot org soon, for your review - and also we will need to do a similar change for family14 and family16kb if this one succeeds.
diff --git a/src/northbridge/amd/agesa/family15tn/northbridge.c b/src/northbridge/amd/agesa/family15tn/northbridge.c index 9d41e7a1f1..0194ea82ea 100644 --- a/src/northbridge/amd/agesa/family15tn/northbridge.c +++ b/src/northbridge/amd/agesa/family15tn/northbridge.c @@ -666,6 +666,8 @@ static void domain_set_resources(struct device *dev) u32 reset_memhole = 1; #endif
domain_read_resources(dev);
pci_tolm = 0xffffffffUL; for (link = dev->link_list; link; link = link->next) { pci_tolm = find_pci_tolm(link);
@@ -749,17 +751,18 @@ static void domain_set_resources(struct device *dev) }
add_uma_resource_below_tolm(dev, 7);
+/* for (link = dev->link_list; link; link = link->next) { if (link->children) { assign_resources(link); } } +*/ }
static struct device_operations pci_domain_ops = {
.read_resources = domain_read_resources,
.set_resources = domain_set_resources,
.read_resources = domain_set_resources,
.set_resources = pci_domain_set_resources, .scan_bus = pci_domain_scan_bus,
};
On Fri, May 15, 2020 at 12:10 PM Mike Banon mikebdp2@gmail.com wrote:
Although it's still the same result even with three changes (either can't boot or no boot devices, randomly) - there is a positive effect that USB FT232H log now 't stop and I'm finally able to share a full log for a boot problem. Please compare these two logs:
- ok_e6fb1344ed9188e19be4b54bdf1a76680b8c4523.txt - boot log for last
"working" commit (before the allocator changes) 2) 3fixes.txt - boot log with 3 changes applied on top of 6b95507ec5b087658178a325bdc68570bc48bb20 (after the allocator changes) Hope this comparison will give enough clues about how to fix it further - and I'll happily test your new changes aimed on fixing this
Best regards, Mike Banon
On Fri, May 15, 2020 at 2:44 AM Furquan Shaikh furquan.m.shaikh@gmail.com wrote:
I have uploaded 2 changes on top of Aaron's change. Can you please give these three changes a try: https://review.coreboot.org/c/coreboot/+/41363 https://review.coreboot.org/c/coreboot/+/41418 https://review.coreboot.org/c/coreboot/+/41419
Thank you!
- Furquan
On Thu, May 14, 2020 at 4:16 PM Aaron Durbin adurbin@google.com wrote:
On Thu, May 14, 2020 at 3:46 PM Aaron Durbin adurbin@google.com wrote: > > > > On Thu, May 14, 2020 at 2:46 PM Mike Banon mikebdp2@gmail.com wrote: >> >> Unfortunately it seems a lot of boards are affected by this. A88XM-E >> and Lenovo G505S (AMD fam15h) also got broken: they rarely succeed at >> booting - and, when they do, no boot devices are available (virtual >> floppies too, for some reason) - except coreinfo/tint secondary >> payloads which became prone to freezing. I attach the A88XM-E logs >> I've been able to obtain with USB FT232H: >> >> 1) ok_e6fb1344ed9188e19be4b54bdf1a76680b8c4523.txt - last coreboot >> repo's revision where all the stuff works >> 2) fail_1_3b02006afe8a85477dafa1bd149f1f0dba02afc7.txt - this commit >> got the boards broken for the first time >> 3) fail_2_6b95507ec5b087658178a325bdc68570bc48bb20.txt - this is a log >> for coreboot's master top >> >> For some reason logs for 2) and 3) always stop after "PCI: 00:12.2 >> EHCI Debug Port hook triggered". >> >> I hope these commits could be reverted before we figure out what's >> going on with them. Good thing we've noticed it fast enough. >> > > Thanks, Mike. The amd chipset code (all of it from what I can tell) is fundamentally broken and at odds with all of the resource allocation flow. They worked previously because dynamic resources were being assigned using an algorithm that just assumed there weren't collisions, and that was done w/o all the necessary info required for making the proper decisions regarding dynamic resource allocation. > > I landed the other chipsets' fixes, but the amd chipset code is going to take a lot more to fix. Would you be willing to test patches as they are crafted? Given the largeness of the problem as well as the gnarly code that is the amd chipset code it's going to take some time so I think we do need to revert the allocator changes until we can do some house keeping. >
I just was brainstorming with Furquan. He did push the revert changes, but we were scheming on a patch that I was hoping affected parties could try in conjunction with https://review.coreboot.org/c/coreboot/+/41363. Basically we'll allocate top down like the previous allocator did hoping for no collisions. Let's try that, and see where we land. Regardless we need to fix this amd chipset code as it's a major liability. > > -Aaron >> >> Best regards, >> Mike Banon >> >> On Thu, May 14, 2020 at 8:47 PM Keith Hui buurin@gmail.com wrote: >> > >> > Hi guys, >> > >> > 31ab7de51a is CB:41368, cherry picked into my local repo. >> > >> > Turns out I have to back out all four of Furquan's patches >> > (CB:39486~39489) for my board to boot normally again. >> > >> > Thoughts? >> > >> > I'll now get a log with everything in at SPEW. >> > >> > >> > On Thu, May 14, 2020 at 1:05 PM Aaron Durbin adurbin@google.com wrote: >> > > >> > > Keith, is it possible to have the console log level set to SPEW? I'm not seeing the full logs to piece it all together. >> > > >> > > Allocating resources... >> > > Reading resources... >> > > Setting RAM size to 768 MB >> > > PNP: 03f0.8 missing read_resources >> > > Done reading resources. >> > > Resource allocator: DOMAIN: 0000 - Pass 1 (gathering requirements) >> > > Resource allocator: DOMAIN: 0000 - Pass 2 (allocating resources) >> > > Resource ranges: >> > > Base: 1000, Size: d000, Tag: 100 >> > > Base: f000, Size: 1000, Tag: 100 >> > > Resource ranges: >> > > Base: 0, Size: ff800000, Tag: 200 >> > > Base: 100000000, Size: f00000000, Tag: 100200 >> > > Resource ranges: >> > > Base: 10000000, Size: 8000000, Tag: 1200 >> > > Resource ranges: >> > > Base: 18000000, Size: 1100000, Tag: 200 >> > > >> > > This is the memory address space: >> > > Base: 0, Size: ff800000, Tag: 200 >> > > Base: 100000000, Size: f00000000, Tag: 100200 >> > > >> > > Those are valid ranges to choose dynamic resources from. >> > > >> > > PCI: 00:00.0 10 <- [0x0000000000 - 0x000fffffff] size 0x10000000 gran 0x1c prefmem >> > > >> > > I see 'Setting RAM size to 768 MB' which means I would expect to see a hole in the ranges representing 768MiB. >> > > >> > > that would be bad. I don't know what commit '31ab7de51a' is, but it might not contain the CB:41368. Having SPEW logs would be helpful. >> > > >> > > Also, what mainboard Kconfig are you selecting for p3bf? src/mainboard/asus/p2b ? >> > > >> > > >> > > >> > > On Thu, May 14, 2020 at 10:42 AM Keith Hui buurin@gmail.com wrote: >> > >> >> > >> (Temporarily leaving the list out) >> > >> >> > >> Hi Aaron, >> > >> >> > >> Here is a log with everything including CB:41368 included. I'll get >> > >> this log out to you first, while I try a build with all problem >> > >> commits left out. >> > >> >> > >> Thanks >> > >> Keith >> > >> >> > >> On Thu, May 14, 2020 at 12:53 AM Aaron Durbin adurbin@google.com wrote: >> > >> > >> > >> > >> > >> > >> > >> > On Wed, May 13, 2020 at 10:51 PM Keith Hui buurin@gmail.com wrote: >> > >> >> >> > >> >> Hi guys, >> > >> >> >> > >> >> I tested these fixes on my board, and I have to say there's still >> > >> >> something wrong. They did address the hang or reset in SeaBIOS I first >> > >> >> described, but now either my ATA hard drive failed to boot (it tried >> > >> >> to hand off to GRUB on my drive, but didn't get there), or it can't >> > >> >> find the option ROM of my video card, meaning no display. >> > >> >> >> > >> >> Now I want to try the other way, testing a build with all changes >> > >> >> related to the problem backed out instead. So besides the one I first >> > >> >> identified, what other related patches should I try backing out? >> > >> > >> > >> > >> > >> > Just go to the parent of the identified patch. As for the other symptoms you are seeing, I'd love to see logs with the patches we identified so we can root cause. >> > >> > >> > >> > Thanks. >> > >> > >> > >> > -Aaron >> > >> > >> > >> >> >> > >> >> On Wed, May 13, 2020 at 11:54 PM Furquan Shaikh >> > >> >> furquan.m.shaikh@gmail.com wrote: >> > >> >> > >> > >> >> > Similar fix for i440x: https://review.coreboot.org/c/coreboot/+/41368 >> > >> >> > >> > >> >> > On Wed, May 13, 2020 at 11:29 AM Aaron Durbin adurbin@google.com wrote: >> > >> >> > > >> > >> >> > > i440x chipset is doing things in the wrong way like sandybridge. I uploaded this fix for sandy: https://review.coreboot.org/c/coreboot/+/41364 We'll need to do the equivalent for i440x. >> > >> >> > > >> > >> >> > > On Wed, May 13, 2020 at 11:13 AM Aaron Durbin adurbin@google.com wrote: >> > >> >> > >> >> > >> >> > >> OK. I'll take a look at your logs and see what's going on. The patch link I sent was based off of someone else's mainboard logs. >> > >> >> > >> >> > >> >> > >> On Wed, May 13, 2020 at 10:59 AM Keith Hui buurin@gmail.com wrote: >> > >> >> > >>> >> > >> >> > >>> Hi Aaron, >> > >> >> > >>> >> > >> >> > >>> It didn't help. There still a way out of whack entry in the coreboot >> > >> >> > >>> table and e820 entry ending at 000003ffffffffff, which I think have >> > >> >> > >>> more to do than the 41363's scope. >> > >> >> > >>> >> > >> >> > >>> Keith >> > >> >> > >>> >> > >> >> > >>> On Wed, May 13, 2020 at 12:24 PM Aaron Durbin adurbin@google.com wrote: >> > >> >> > >>> > >> > >> >> > >>> > I think the following patch will fix things up: https://review.coreboot.org/c/coreboot/+/41363 Please let me know. >> > >> >> > >>> > >> > >> >> > >>> > On Wed, May 13, 2020 at 8:43 AM Keith Hui buurin@gmail.com wrote: >> > >> >> > >>> >> >> > >> >> > >>> >> Thanks Furquan. >> > >> >> > >>> >> >> > >> >> > >>> >> Here are 3 logs. Log 1 is at the commit just before the problem. Log 2 >> > >> >> > >>> >> is at the problem commit. Log 3 is at the current master, if that's >> > >> >> > >>> >> what you meant by ToT. >> > >> >> > >>> >> >> > >> >> > >>> >> I'm using SeaBIOS 1.13.0, compiled once using the attached .config >> > >> >> > >>> >> before taking these logs. All 3 runs are taken using the same SeaBIOS >> > >> >> > >>> >> binary. >> > >> >> > >>> >> >> > >> >> > >>> >> Then I recompiled SeaBIOS with CONFIG_RELOCATE_INIT off, replaced the >> > >> >> > >>> >> payload used in run 3, and took an extra run. In this case the board >> > >> >> > >>> >> reset on its own at "Scanning option roms", looping infinitely. >> > >> >> > >>> >> >> > >> >> > >>> >> Hope this helps >> > >> >> > >>> >> Keith >> > >> >> > >>> >> >> > >> >> > >>> >> On Wed, May 13, 2020 at 7:38 AM Furquan Shaikh >> > >> >> > >>> >> furquan.m.shaikh@gmail.com wrote: >> > >> >> > >>> >> > >> > >> >> > >>> >> > Thanks for the report Keith! >> > >> >> > >>> >> > >> > >> >> > >>> >> > On Wed, May 13, 2020 at 3:42 AM Paul Menzel pmenzel@molgen.mpg.de wrote: >> > >> >> > >>> >> > > >> > >> >> > >>> >> > > Dear Keith, >> > >> >> > >>> >> > > >> > >> >> > >>> >> > > >> > >> >> > >>> >> > > Am 13.05.20 um 05:21 schrieb Keith Hui: >> > >> >> > >>> >> > > >> > >> >> > >>> >> > > > I am still refining the P2B family of boards, now including the >> > >> >> > >>> >> > > > infamous P3B-F with an unusual appetite for hacks to make work. >> > >> >> > >>> >> > > > >> > >> >> > >>> >> > > > That said, I'm now finding that, on P3B-F, SeaBIOS hangs when it tries >> > >> >> > >>> >> > > > to relocate itself as part of its usual chores. Having just learned >> > >> >> > >>> >> > > > git bisect, I decided to try it out. >> > >> >> > >>> >> > > > >> > >> >> > >>> >> > > > It was commit 3b02006afe8a85477dafa1bd149f1f0dba02afc7 [1] that broke >> > >> >> > >>> >> > > > my SeaBIOS. It doesn't affect my newer toy the P8Z77-M as much as >> > >> >> > >>> >> > > > P3B-F, but I still want to blame that, and probably the very next >> > >> >> > >>> >> > > > commit as well, as they both deal with some very modern aspects of PCI >> > >> >> > >>> >> > > > that well predates the 440BX. >> > >> >> > >>> >> > > > >> > >> >> > >>> >> > > > Is there anything we can do to fix 3b02006afe? >> > >> >> > >>> >> > > >> > >> >> > >>> >> > > I commented in the change-set [1] to make the author and reviewers aware >> > >> >> > >>> >> > > of this issue and referenced your list message, and ask to comment here. >> > >> >> > >>> >> > > >> > >> >> > >>> >> > > Could you please provide the debug log of coreboot and SeaBIOS? >> > >> >> > >>> >> > >> > >> >> > >>> >> > As Paul mentioned, can you please provide the debug logs for coreboot >> > >> >> > >>> >> > and SeaBIOS both with ToT coreboot and with HEAD set before the change >> > >> >> > >>> >> > 3b02006afe where it does not hang? Thanks! >> > >> >> > >>> >> > >> > >> >> > >>> >> > > >> > >> >> > >>> >> > > >> > >> >> > >>> >> > > > Meanwhile I ported the P3B-F board enable to flashrom [2], which got a >> > >> >> > >>> >> > > > heavy workout during this bisect, through vendor firmware and both >> > >> >> > >>> >> > > > good and bad builds of coreboot. In all cases I can flash internal, no >> > >> >> > >>> >> > > > longer having to haul out my P2B-LS just to use it as a flasher. >> > >> >> > >>> >> > > > >> > >> >> > >>> >> > > > Enjoy this long overdue board enable. If it gets submitted, I'll >> > >> >> > >>> >> > > > retract the ramstage hack[3] doing the same as redundant. >> > >> >> > >>> >> > > >> > >> >> > >>> >> > > Very nice! It’s always amazing, how after so many years, when the vendor >> > >> >> > >>> >> > > already stopped supporting the device, the community still supports the >> > >> >> > >>> >> > > device and improves the firmware showing that Free Software is the more >> > >> >> > >>> >> > > sustainable way. >> > >> >> > >>> >> > > >> > >> >> > >>> >> > > >> > >> >> > >>> >> > > Kind regards, >> > >> >> > >>> >> > > >> > >> >> > >>> >> > > Paul >> > >> >> > >>> >> > > >> > >> >> > >>> >> > > >> > >> >> > >>> >> > > > [1] https://review.coreboot.org/c/coreboot/+/39486 >> > >> >> > >>> >> > > > [2] https://review.coreboot.org/c/flashrom/+/41354 >> > >> >> > >>> >> > > > [3] https://review.coreboot.org/c/coreboot/+/41224 >> > >> >> > >>> >> > > _______________________________________________ >> > >> >> > >>> >> > > coreboot mailing list -- coreboot@coreboot.org >> > >> >> > >>> >> > > To unsubscribe send an email to coreboot-leave@coreboot.org >> > >> >> > >>> >> _______________________________________________ >> > >> >> > >>> >> coreboot mailing list -- coreboot@coreboot.org >> > >> >> > >>> >> To unsubscribe send an email to coreboot-leave@coreboot.org >> > _______________________________________________ >> > coreboot mailing list -- coreboot@coreboot.org >> > To unsubscribe send an email to coreboot-leave@coreboot.org
Holy moly..
Furquan Shaikh wrote:
- Reland new allocator guarded by a Kconfig:
I looked at this and the original change.
It's nice that the new algorithm is well commented!
But I could not find any detailed explanation of the neccessity for touching much less *rewriting* this quite literally central piece of code in coreboot.
All I could find was "prepare for 64 bit resources". That is beyond weak.
Even worse, I also could not find any explanation of how the new algorithm is different from the old, neither in commit messages nor in comments.
I have to say that I am really amazed and apalled by this change.
I am so relieved that I don't invest heavily in coreboot anymore because I would be furious if I did, not to mention how I would feel if I had been contributing significantly to the existing, working algorithm.
Google, please consider what that means. Regardless of intentions, if you treat everyone else who contributes to the coreboot community with so little respect then you as a primary contributor set an atmosphere where others will not be respectful either. As a result, the project races to the bottom.
This is obviously the classic conflict between one strong contributor working toward their special interest (enable particular future work) and the larger coreboot community having a fundamentally different interest (do not break a working core algorithm).
In such situations, the strong contributor must always carry responsibility, and the responsible action in this case is to ensure that the new code is opt-in, not opt-out.
We should not take for granted that our own stuff is the most important, but rather the other way around.
I want to emphasize that I do not blame Furquan here, but Google the organization. This situation suggests to me that something isn't right in Google's coreboot team. The classic problem would be that there are some impossible (time) constraints, which just ends up being really toxic. :\
Kind regards
//Peter
Peter Stuge peter@stuge.se schrieb am Sa., 16. Mai 2020, 15:39:
Holy moly..
Indeed...
All I could find was "prepare for 64 bit resources". That is beyond weak.
To people dealing both with somewhat modern hardware and coreboot, it is well known that the resource allocator has a few crucial weaknesses in that area. See, for example, review.coreboot.org/12575 which tried to merely work around the issue in 2015.
Even worse, I also could not find any explanation of how the new algorithm
is different from the old, neither in commit messages nor in comments.
That may belong in the commit message but certainly not in any comments in the tree.
I am so relieved that I don't invest heavily in coreboot anymore because
I would be furious if I did, not to mention how I would feel if I had been contributing significantly to the existing, working algorithm.
Have you looked at the type of contributions to the resource allocator in the last 2 years? Some fixes for sure but "significant contributions", not so much.
If anything this was because the allocator is a pretty hairy beast that nobody dared to come closer than strictly necessary. That seems to be the one unifying property of resource allocators by the way, given that the last rewrite had a similar genesis.
This is obviously the classic conflict between one strong contributor
working toward their special interest (enable particular future work)
Like allowing dGPUs that announce several gigs of on board RAM, or thunderbolt or USB4, yes. Highly special interests, indeed. (A few of them that Furquan's project doesn't even care about).
(do not break a working core algorithm).
I guess, if anything, you can complain to me about that: after all, I was the person who submitted that to master and who also held off on reverting it because I see value in getting that thing in.
We should not take for granted that our own stuff is the most important,
but rather the other way around.
I'd treat grandiose organizational psychoanalysis on public forums the same way but I guess everybody has their vice:
suggests to me that something isn't right
in Google's coreboot team. The classic problem would be that there are some impossible (time) constraints, which just ends up being really toxic. :\
Hi Peter,
On 16.05.20 15:39, Peter Stuge wrote:
But I could not find any detailed explanation of the neccessity for touching much less *rewriting* this quite literally central piece of code in coreboot.
I can't speak for Furquan or Google, but I can provide some insight why such work is necessary in general.
The traditional code is rather broken. I can't say if it was like this from the beginning (I didn't witness), or if it was broken later on. We have many workarounds throughout the code base to keep things running with this code. For instance, we assign all super-io resources manually. If we don't, the allocator tries to assign PCI resources in the narrow window of the super-io. I've traced this particular problem down once, and believe it was introduced about a decade ago. Probably also to pre- pare for 64-bit support. The history is full of half merged, failed attempts that were probably all wrong.
Support for legacy VGA resources on PCI is broken (we have it working now on modern systems that fully support 16-bit i/o decoding). There are also some hardcoded legacy i/o things that are just wrong.
The existing design also does not allow for bridges with multiple win- dows. One use case would be Intel's LPC bridges. Which we configure manually, mostly, but also got some (broken) code for newer Intel plat- forms (that conflicts with manually assigned resources).
I guess there are many more quirks that we already got used to and just don't see anymore. And yes, on top, some people wish for 64-bit support.
All I could find was "prepare for 64 bit resources". That is beyond weak.
Even worse, I also could not find any explanation of how the new algorithm is different from the old, neither in commit messages nor in comments.
Such an explanation would have been nice, I agree. But I also see a problem here: How to describe an algorithm that just doesn't make any sense? And if we'd make a mistake documenting the existing code, wouldn't that make it worse?
I'm not sure if Google is to blame here. They did some work, reviewed it, merged it. But they were also the first to push a revert. I think the big issue here is a communication issue that I see in the whole community. Nobody uses the mailing list anymore unless something breaks.
I would have wished for an announcement and that more people would have been invited for review. But apart from that, I don't see a reason to quarrel.
Nico
On Sat, May 16, 2020 at 9:07 AM Nico Huber nico.h@gmx.de wrote:
I would have wished for an announcement and that more people would have been invited for review. But apart from that, I don't see a reason to quarrel.
Thanks Nico! Feedback taken. I will keep that in mind the next time :).
Nico _______________________________________________ coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org
This is a welcome change, long overdue. It's not surprising there were issues. It's far reaching enough that, if you had time, something in Documentation detailing what you did and why could be very valuable for future contributors?
ron
On Sat, May 16, 2020 at 12:30 PM Furquan Shaikh furquan.m.shaikh@gmail.com wrote:
On Sat, May 16, 2020 at 9:07 AM Nico Huber nico.h@gmx.de wrote:
I would have wished for an announcement and that more people would have been invited for review. But apart from that, I don't see a reason to quarrel.
Thanks Nico! Feedback taken. I will keep that in mind the next time :).
Nico _______________________________________________ coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org
coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org
On Sun, May 17, 2020 at 7:51 AM ron minnich rminnich@gmail.com wrote:
This is a welcome change, long overdue. It's not surprising there were issues. It's far reaching enough that, if you had time, something in Documentation detailing what you did and why could be very valuable for future contributors?
Sounds good. I will do that.
ron
On Sat, May 16, 2020 at 12:30 PM Furquan Shaikh furquan.m.shaikh@gmail.com wrote:
On Sat, May 16, 2020 at 9:07 AM Nico Huber nico.h@gmx.de wrote:
I would have wished for an announcement and that more people would have been invited for review. But apart from that, I don't see a reason to quarrel.
Thanks Nico! Feedback taken. I will keep that in mind the next time :).
Nico _______________________________________________ coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org
coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org
Hi all,
On 16.05.20 18:07, Nico Huber wrote:
On 16.05.20 15:39, Peter Stuge wrote:
Even worse, I also could not find any explanation of how the new algorithm is different from the old, neither in commit messages nor in comments.
Such an explanation would have been nice, I agree. But I also see a problem here: How to describe an algorithm that just doesn't make any sense? And if we'd make a mistake documenting the existing code, wouldn't that make it worse?
I'm done reading through both versions. Looks like the old code wasn't that bad and both aren't much different. However, the major difference is exactly what hold the old code back: The way to decide where in the resource space we place everything.
Both versions run two passes: 1. propagate downstream requirements up through the bridges. 2. from top (domain) to bottom (devices) calculate the final place of resources of one level inside the window of the upper level.
In the 1. pass, the new code doesn't account for legacy 10-bit PCI I/O, and stops propagating *before* the domain.
In the 2. pass, the new code doesn't use a single continuous window to place resources in, but uses a memrange list of all available space. For every resource, ordered from largest to smallest alignment, simply the first fitting space in the memrange is chosen. This changes things at the domain level, but not further down at the bridge level, because for the latter the memrange will be continuous. The new code marks resource that it couldn't allocate as `assigned`, looks like bug / needs investigation.
But, what did the old code do? As said above, it uses a single conti- nuous window to place resources in. Like a stale comment above dev_ configure() states, discovering this window differs for I/O and memory resources:
* I/O resources grow upward. MEM resources grow downward.
Even though this comment was left in place, I'm certain that the new code doesn't do it like that. The old code only stopped in the 1. pass at the domain resource. So we knew exactly how much space was needed at this level and used this information to limit the available window *at the bottom*. So for memory resources, this window was moved up as far as possible *unless* the calculated size requirement overflew the window, then it was moved down (this is where the bugs start and why people kept telling me that the allocator would overlap things with DRAM; even though it can print a big "!! Resource didn't fit!!" error).
This allowed us to have two schemes to decide where the memory I/O hole below 4GiB starts. I've only learned today that the second scheme was intentionally supported when I've look at comments and code around find_pci_tolm() (including long removed chipset code):
1. Define the start of the memory I/O hole early. This is what all the Intel code does. We just say (sometimes configure in the devicetree) how much i/o space we want. Then we can reserve the DRAM space and the allocator should avoid it (old failed, it seems).
2. Don't reserve DRAM space, run the allocator, adapt DRAM mapping to the result. Still used by many AGESA based ports, it seems. Only supported by the old code.
It's probably easy to maintain compatibility with 2. by searching the memranges backwards. However, I doubt that these ports actually still support a moving start of the memory I/O hole, given that they have CBMEM at restore_top_of_low_cacheable(). I will have a look at the latest patches for AMD ports, maybe we don't have to do anything here.
Nico
On Sat, May 16, 2020 at 6:39 AM Peter Stuge peter@stuge.se wrote:
Holy moly..
Furquan Shaikh wrote:
- Reland new allocator guarded by a Kconfig:
I looked at this and the original change.
It's nice that the new algorithm is well commented!
But I could not find any detailed explanation of the neccessity for touching much less *rewriting* this quite literally central piece of code in coreboot.
All I could find was "prepare for 64 bit resources". That is beyond weak.
Even worse, I also could not find any explanation of how the new algorithm is different from the old, neither in commit messages nor in comments.
Patrick and Nico already provided some background in their responses. I had made an attempt to capture the weaknesses of and the differences from the old resource allocator as part of the commit message: https://review.coreboot.org/c/coreboot/+/39486. I understand that commit messages can never be too long. There have been several attempts in the past years to add changes here and there to the resource allocator or work around it for different use cases. So, I tried to keep the commit message informative enough to understand the intent. If you think there is more information that could have been helpful to understand this better, I would be very happy to incorporate that in the latest CL before the changes land back in the tree.
I have to say that I am really amazed and apalled by this change.
I am so relieved that I don't invest heavily in coreboot anymore because I would be furious if I did, not to mention how I would feel if I had been contributing significantly to the existing, working algorithm.
Google, please consider what that means. Regardless of intentions, if you treat everyone else who contributes to the coreboot community with so little respect then you as a primary contributor set an atmosphere where others will not be respectful either. As a result, the project races to the bottom.
I apologize if this demonstrated less or lack of respect. That has never been the intent. In my opinion, one of the things that makes coreboot strong is the respect individuals in the community have towards each other and the work that is being done. And I have always valued that. Personally, I have learnt a lot in this community and this has been an attempt to improve coreboot not just for the things I care about but also making it easier and better for everyone. It is no news that modern architecture is changing and coreboot needs to evolve. This change aims to take everyone forward and not just focus on personal gains.
This is obviously the classic conflict between one strong contributor working toward their special interest (enable particular future work) and the larger coreboot community having a fundamentally different interest (do not break a working core algorithm).
I would say that there were a number of broken assumptions and issues hiding behind the old changes which got exposed by this change. A lot of these issues were just going unnoticed for years now. I don't claim that the new changes are perfect in every sense, but like any other work, I am working with the community to make things better overall.
In such situations, the strong contributor must always carry responsibility, and the responsible action in this case is to ensure that the new code is opt-in, not opt-out.
We should not take for granted that our own stuff is the most important, but rather the other way around.
I want to emphasize that I do not blame Furquan here, but Google the organization. This situation suggests to me that something isn't right in Google's coreboot team. The classic problem would be that there are some impossible (time) constraints, which just ends up being really toxic. :\
I think the situation is kind of opposite. If time constraints were the only thing that we cared about, then like many other changes, this would have been a very isolated change to make that one use case work and add more legacy to carry around.
Kind regards
//Peter _______________________________________________ coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org