Hi Ragnaros,
to solve (some of) the hangs I highly recommend disabling the serial console completely as discussed here:
https://mail.coreboot.org/hyperkitty/list/coreboot@coreboot.org/thread/GA3WY...
Using 16 GB sticks did not work for me either, but I found a configuration that works if you use only some of them together with 8 GB slots resulting in at least 96 GB of fully working memory in a single-cpu configuration. This might also work with 192 GB in a dual-cpu configuration. See my posting here for details:
https://mail.coreboot.org/hyperkitty/list/coreboot@coreboot.org/message/A63P...
If you have only 16 GB sticks available, use the orange slots only or just a single DIMM in the first orange slot before going any further. It doesn't make sense to diagnose issues such a graphics output or non-booting payloads when you don't rule out inproperly working memory first.
Gfx initialization with a dedicated nvidia GPU and booting from NVMe works fine for me, although I'm not using 4.11 but an older version (master somewhere between 4.8 and 4.9). I just use a text-based ("legacy") graphics without splash and boot the kernel from there which then initializes the graphics properly and switches them to graphics mode. No issues with this so far.
Regarding the fans: If you run linux, you can control them in userland using a tool such as "fancontrol" if you don't want to control them by the BMC or don't have it installed. The fancontrole profile I created for this (attached) results in the following RPM and temperature levels at the moment in my (computer) case:
GPU core: +0.90 V (min = +0.90 V, max = +0.90 V) temp1: +60.0°C (high = +95.0°C, hyst = +3.0°C) (crit = +105.0°C, hyst = +5.0°C) (emerg = +135.0°C, hyst = +5.0°C)
fam15h_power-pci-00c4 Adapter: PCI adapter power1: 51.28 W (crit = 114.49 W)
...
fan1: 596 RPM (min = 329 RPM) fan2: 584 RPM (min = 329 RPM) fan3: 0 RPM (min = 329 RPM) ALARM fan4: 0 RPM (min = 329 RPM) ALARM fan5: 585 RPM (min = 329 RPM) fan6: 696 RPM (min = 329 RPM) fan7: 0 RPM (min = 329 RPM) ALARM fan8: 0 RPM (min = 329 RPM) ALARM temp1: +68.2°C (high = +70.0°C, hyst = +65.0°C) (crit = +85.0°C, hyst = +80.0°C) sensor = thermal diode temp7: +25.8°C (high = +70.0°C, hyst = +65.0°C) (crit = +85.0°C, hyst = +80.0°C) sensor = AMD AMDSI temp8: +0.0°C (high = +70.0°C, hyst = +65.0°C) (crit = +85.0°C, hyst = +80.0°C) sensor = AMD AMDSI
...
k10temp-pci-00cb Adapter: PCI adapter temp1: +22.9°C (high = +70.0°C)
Normally, my fans run at around 350-400 RPM in idle and are hardly hearable. The increased rotation levels are probably due to the fact that I didn't clean the dust filters for a while. If you use the profile you might need to adjust the devpath probably, but you can try to create your own with pwmconfig and then tweak the values.
Cheers, Daniel
Thanks for the suggestions, everyone.
I currently let SeaBIOS handle the video card's initialization, as there's currently no other options to choose from nconfig/menuconfig regarding graphics init and framebuffer mode for this board. It's just that this "Include generated option rom that implements legacy VGA BIOS compatibility" option might cause issues. Disabling it fixed most of the video issues and it's currently working fine.
I currently don't have BMC installed but I do have the module with me. Is there an up-to-date build guide for OpenBMC on this board? The build guide on Raptor Engineering's site looks outdated as some repo links were dead/moved when I tried to follow it. By the way, I'm currently building coreboot images from latest Manjaro (Arch-based) just fine (as the toolchains needed were also built), can OpenBMC (as well as necessary toolchains) be built here as well, or that I need to setup a VM using an earlier, recommended distro for the purpose?
As for serial console, I think I'll disable it once all other issues are solved. Currently I can always reach SeaBIOS boot menu with my current memory configuration, and that some of the floppy images are booting just fine. It's just that the secondary payloads (memtest, nvramcui, coreinfo, tint) are having issues.
It seems Mike Banon's patch for enabling support for multiple floppy images does work and I could now see and choose all the floppy images I put into the image, but other patches (including one which properly modified the boot selection part so I can select boot entries past number 10) could not be applied out-of-box and require some manual adjustments, even on SeaBIOS 1.12.1.
I think existing RAM HCL entries which corresponded to Libreboot versions may not be considered reliable, as some memory configurations that worked with Libreboot do not work here (which was also the case for KCMA-D8 that has been discussed before).
Some updates regarding the experiments:
1. Switched SeaBIOS back to master and it turned out this time all Mike Banon's patches applied successfully, so those patches were indeed for master. Not sure why some of these patches are marked as "Failed in applying to current master" in patchew.org's catalog. I can now properly choose which floppy image (or real device) to boot. Still, I'd like to be able to change some boot orders later on so I can let the system auto-boot to the first hard disk if needed.
2. I disabled the serial console as it appeared that memtest is actually outputting things there as well... something like: [LINE_SCROLL;24r[H[2J[37m[44m[0m[37m[44m[1;1HMemtest86+ 5.01 coreboot 002[0m[6;61H| Time: 0:00:00[2;31HPass %[3;31HTest %[4;31HTest #[5;31HTesting: [6;31HPattern: [2;1HCLK: (32b Mode)[3;1HL1 Cache: Unknown [4;1HL2 Cache: Unknown [5;1HL3 Cache: None [6;1HMemory : [7;1H------------------------------------------------------------------------------[8;1HCore#:[9;1HState:[10;1HCores: Active / Total (Run: All) | Pass: 0 Errors: 0 [11;1H------------------------------------------------------------------------------[8;40H| Chipset : Unknown[9;40H| Memory Type : Unknown[1;29H| [2;29H| [3;29H| [4;29H| [5;29H| [6;29H| [25;1H(ESC)exit (c)configuration (SP)scroll_lock (CR)scroll_unlock[6;12H128[6;15HG[1;31HAMD Opteron(tm) Processor 6386 SE[2;11HMHz[2;6H2800[3;11H K [3;13H64[3;24HMB/s[3;18H28867[4;11H K [4;11H2048[4;24HMB/s[4;18H23932[5;12H K [5;13H12[5;15HM[5;24HMB/s[5;19H8383[19;19H==> Press F1 to enter Fail-Safe Mode <==[20;16H==> Press F2 to force Multi-Threading (SMP) <==[19;19H [20;16H [8;8H0[9;8HS[10;21H1[8;10H(SMP: Disabled)[9;10HRunning...[8;42HRAM: [8;47H800 [8;51HMHz ([8;56HDDR3-[8;61H1600[8;65H)[8;67H- BCLK: [8;76H87[9;42HTimings: CAS [9;55H11[9;57H-[9;58H11[9;60H-[9;61H11[9;63H-[9;64H28[9;67H@ 128-bit Mode[24;34HASUS[24;39HKGPE-D16[13;1HMemory SPD Informations[14;1H--------------------------[9;32H22[8;27H| CPU Temp[9;27H| C[2;17HPAE Mode)[2;17HX64 Mode)[6;24HMB/s[4;37H2 [4;40H[Address test, own address Parallel] [9;8HW[10;9H1[9;8H-[6;57HR[5;43H0[5;44HK[5;46H- [5;50H32[5;52HM[5;58H32[5;60HM[5;62Hof [5;66H128[5;69HG[6;42Haddress [3;37H0[2;37H0[6;77H1[9;32H24
However, disabling serial console made little difference as the situation is the same as usual (memtest crashes with the same glitched screen). I checked about the post regarding the hangs, but it seems coreboot itself has changed so much over time that there were way too many differences in the .config file.
3. After some testing, using the following combinations for 16GB sticks can also make the system bootable with all 128GB recognized: D1/D2, B1/B2, F1/F2, H1,H2. For 64GB (4 sticks), using the 1 (orange) slots would do. However, this also does not make any difference. I currently don't have any 8GB registered sticks at hand for testing, but does fully populating the slots with 8GB (16 x 8GB = 128GB) work? If so, does brand (Samsung/Micron/Hynix) matter?
4. It seems at present, not everything works out-of-box, and I think the issue of "Not enough memory creating EHCI periodic frame list." might require some immediate attention (something like changing the stack and heap sizes, which might also affect other stuffs).
The system is currently at best usable for booting some popular floppy images (and discover some interesting use cases). I've tested some floppy images, like Menuet64, tatOS, Floppybird, and they are currently working without major issues. I still have a few more images included for testing, and I think I may consider booting something else to check whether the current memory configuration is really the root cause of some issues or not.
A little more update.
I did an experiment by using only the PS/2 keyboard and mouse (without connecting the USB ones), as I realized that USB stuffs don't work on floppies that could boot and with their own USB stacks.
1. coreinfo now outputs one of these two errors instead of the "Not enough memory creating EHCI periodic frame list." I used to get, although the first error message only appeared once. Not enough DMA memory for OHCI HCCA. drivers/usb/usb.c:42 new_controller(): Failed to malloc 588 bytes.
2. At one time I managed to make memtest enter SMP mode. However, I only did the first test, as during the procedure (address test, walking ones, no cache) I got errors every 2GB past 32GB, with the good and bad ones showing inverted contents (bad is ffffffff when good is 00000000, and vice versa).
That was the only time I managed to get memtest to enter the test phase. With other attempts (and without entering SMP mode), I got the same glitched screen.
So it seems there are issues with both the memory configuration as well as the USB-related stuffs. Unfortunately, it seems large RAM configurations (like 8-16 sticks of 8-16GB) were only tested to a limited extent so there might still be some bugs that don't surface with small RAM configurations (like 1-4 sticks of 4-8GB) which were tested more often.