Over the weekend I had the realization that SMI logging was enabled and interfering with WinDbg. Once I flashed a non-serial firmware WinDbg became a lot more stable and I was able to reliably attach to the boot loader debugger i.e., `/bootdebug {default}`. The OS debugger (`/debug {default} on`) was still not functioning though. I wasn't sure If the BSOD was happening in the boot loader or the OS kernel, so I stepped through the boot loader until I saw it jump to the OS. From there WinDbg failed to restore the connection. The exception happens before the OS is capable of writing a kernel dump. I was also suspecting it was happening before the debugger was set up since the connection could not be re-established.
I then saw Felix's reply:
To decode the bug check values and their parameters, see https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/bug-check...
The third parameter you posted decodes to _UID (that one is 4 char ASCII stored as little endian number).
The parameters I gathered last week:
0x0000000000000000 OxFFFFD38AC66EC7FO Ox000000004449555F 0x0000000000000000
The first parameter 0x0 wasn't listed in the table. So this led me to believe that maybe there was a problem parsing the ACPI tables. Felix suggested I use the [Microsoft ASL compiler](https://docs.microsoft.com/en-us/windows-hardware/drivers/bringup/microsoft-...) to decompile the AML and verify if the tables were valid.
I used linux to dump the ACPI tables via `/sys/firmware/acpi/tables/`, added the `.AML` suffix, and ran `asl.exe /u DSDT.AML`. This printed an error saying `NVSA was already defined`. Using iasl to decompile the table I saw the following:
``` External (NVSA) Name (NVSA, 0xCA6B2000) OperationRegion (GNVS, SystemMemory, NVSA, 0x1000) ```
The external reference was defined in `.asl`: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/master:src/th... The `Name` node was created by acpigen: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/master:src/th...
Removing the External from the `.asl`. results in the `iasl` compiler complaining about a missing reference. So I move the `Name` node to the SSDT table. This resulted in linux complaining that `GVNS` was invalid because it couldn't find `NVSA`. The ACPI spec says the following:
OperationRegion (RegionName, RegionSpace, Offset, Length) Operation regions are regions in some space that contain hardware registers for exclusive use by ACPI
control methods. ...
The entire Operation Region can be allocated for exclusive use to the ACPI subsystem in the host OS. Operation Regions that are defined within the scope of a method are the exception to this rule. These Operation Regions are known as “Dynamic” since the OS has no idea that they exist or what registers they use until the control method is executed.
I'm guessing that we can't move the `NVSA` node to the SSDT because the value is required when instantiating `OperationRegion`.
I'm not quite sure how to solve this. For now I just hard coded the address in the `OperationRegion`. I'm open to suggestions.
The second problem was `OIPG`. It is also defined twice:
https://source.chromium.org/chromiumos/chromiumos/codesearch/+/master:src/th...
https://source.chromium.org/chromiumos/chromiumos/codesearch/+/master:src/th...
Changing the callback to write to the SSDT table fixed that problem and I was finally able to decode `DSDT.AML` using the `Microsoft ASL Compiler`.
I then disabled `/bootdebug` and `/debug` since they weren't providing any value and were preventing me from seeing the BSOD error codes. One thing I noticed was that rebooting after a BSOD the boot loader would boot a "system restore" image. This image used a different registry than the OS so the error codes were not printed on the screen. Rebooting again the boot loader would load the OS. So each test required a double reboot. I'm also using Tianocore to boot Windows, which is super slow...
The BSOD this time looked identical to the previous one... But upon closer inspection the error code was different:
0x000000000000000D
... I went back and looked at the photo I took of the original BSOD and it was indeed 0x000000000000000D! The font made this easy to miss the first time around. Google Lens didn't pick it up either.
The exception now made sense:
ACPI could not find a required method or object in the namespace This bug check code is used if there is no _HID or _ADR present.
and to re quote Felix:
The third parameter you posted decodes to _UID (that one is 4 char ASCII stored as little endian number).
So a device was missing a _UID. I manually audited all the Device nodes in the DSDT and SSDT and indeed we had devices that were missing `_UID` and some devices that even had duplicate `_UID`s. When I fixes this I got a new BSOD:
0x06 - ACPI tried to find a named object, but it could not find the object. 0x<some pointer> 0x<some pointer> 0x<some pointer>
This was discouraging since I didn't have a way of dereferencing the pointers. I decided to double check the `SPCR` and the `DBG2` tables. The `DBG2` table was using `MMIO` while the `SPCR` was using `IO`. So I switched it over to `IO` since I knew that worked, set `/debug {default} on` and voila OS kernel debugger!
Doing `!analyze -v` showed the error and parameters. It did lock up trying to print the details section. Hitting the `Break` button cancelled the operation and I was able to continue. A simple `!nsobj <pointer>` showed that the FUR0 power resource wasn't being found: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/master:src/th... I suspect it's because the `AOAC` [node](https://source.chromium.org/chromiumos/chromiumos/codesearch/+/master:src/th...) is defined as a bridge device. I commented out all the `AOAC` and power resources and was then greeted with:
0x1000D - _PRW specified with no wake-capable interrupts and at least one GPIO interrupt A device used both GPE and GPIO interrupts, which is not supported.
If I understand it correctly, that means that we can't use a GPE in the _PRW and a GPIO in the _CRS.
i.e., Device (CRFP) { Name (_HID, "PRP0001") // _HID: Hardware ID Name (_UID, Zero) // _UID: Unique ID Name (_DDN, "Fingerprint Reader") // _DDN: DOS Device Name
Name (_CRS, ResourceTemplate () // _CRS: Current Resource Settings { UartSerialBusV2 (0x002DC6C0, DataBitsEight, StopBitsOne, 0x00, LittleEndian, ParityTypeNone, FlowControlNone, 0x0040, 0x0040, "\_SB.FUR1", 0x00, ResourceConsumer, , Exclusive, ) GpioInt (Level, ActiveLow, Exclusive, PullDefault, 0x0000, "\_SB.GPIO", 0x00, ResourceConsumer, , ) { // Pin list 0x0006 } }) Name (_S0W, 0x04) // _S0W: S0 Device Wake State Name (_PRW, Package (0x02) // _PRW: Power Resources for Wake { 0x0A, 0x03 }) }
I'm guessing the _PRW needs to reference the GPIO controller, and that controller must have an `_AEI` defining the pin. Not really sure why Windows has a problem with mixed event types. For now I just commented out all the I2C and UART peripherals.
With all that I was finally able to boot into Windows!
Now on to making some CLs and fixing the remaining issues.
On Fri, Jan 15, 2021 at 5:52 PM Felix Held felix-coreboot@felixheld.de wrote:
Forgot to add that to find out what the cause is the easiest way is probably having the installed image configured in a way that it'll write full kernel memory dumps to disk and then use !analyze -v in WinDbg on that generated kernel dump. At least that's what I remember from more than 1.5 years ago, so some of the info might not be 100% accurate.
Regards, Felix