Dear coreboot folks,
since Linux 3.12+ graphics stolen memory seems to be used, which causes a regression with coreboot and native graphics init at least on the Intel 945 based Lenovo X60 [1].
This seems to be solved by properly setting register `PGETBL_CTL` [2], but the Linux kernel Intel graphics driver still reports a GTT (Graphics Translation Table) related error during start-up, but everything else seems to work.
[ 1.235640] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 1.236583] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 1.236583] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 1.236583] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 1.236583] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 1.236583] i915: render error detected, EIR: 0x00000010 [ 1.236583] i915: page table error [ 1.236583] i915: PGTBL_ER: 0x00000013 [ 1.236583] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking [ 1.236583] i915: render error detected, EIR: 0x00000010 [ 1.236583] i915: page table error [ 1.236583] i915: PGTBL_ER: 0x00000013
Looking further into the native graphics init code for Intel IGDs, I wonder what `setgtt()` does.
Looking at Google Link, where this was introduced, there is the code below.
$ nl -ba src/mainboard/google/link/i915.c […] 95 void io_i915_WRITE32(unsigned long val, unsigned long addr) 96 { 97 if (verbose & vio) 98 printk(BIOS_SPEW, "%s: outl %08lx\n", regname(addr), val); 99 outl(addr, addrport); 100 outl(val, dataport); 101 } 102 103 104 /* 105 2560 106 4 words per 107 4 *p 108 10240 109 4k bytes per page 110 4096/p 111 2.50 112 1700 lines 113 1700 * p 114 4250.00 115 PTEs 116 */ 117 static void 118 setgtt(int start, int end, unsigned long base, int inc) 119 { 120 int i; 121 122 for(i = start; i < end; i++){ 123 u32 word = base + i*inc; 124 WRITE32(word|1,(i*4)|1); 125 } 126 } […] 349 /* GTT is the Global Translation Table for the graphics pipeline. 350 * It is used to translate graphics addresses to physical 351 * memory addresses. As in the CPU, GTTs map 4K pages. 352 * There are 32 bits per pixel, or 4 bytes, 353 * which means 1024 pixels per page. 354 * There are 4250 GTTs on Link: 355 * 2650 (X) * 1700 (Y) pixels / 1024 pixels per page. 356 * The setgtt function adds a further bit of flexibility: 357 * it allows you to set a range (the first two parameters) to point 358 * to a physical address (third parameter);the physical address is 359 * incremented by a count (fourth parameter) for each GTT in the 360 * range. 361 * Why do it this way? For ultrafast startup, 362 * we can point all the GTT entries to point to one page, 363 * and set that page to 0s: 364 * memset(physbase, 0, 4096); 365 * setgtt(0, 4250, physbase, 0); 366 * this takes about 2 ms, and is a win because zeroing 367 * the page takes a up to 200 ms. We will be exploiting this 368 * trick in a later rev of this code. 369 * This call sets the GTT to point to a linear range of pages 370 * starting at physbase. 371 */ 372 setgtt(0, FRAME_BUFFER_PAGES, physbase, 4096); 373 printk(BIOS_SPEW, "memset %p to 0 for %d bytes\n", 374 (void *)graphics, FRAME_BUFFER_BYTES); […] $ git grep FRAME_BUFFER src/mainboard/google/link/i915io.h src/mainboard/google/link/i915io.h:#define FRAME_BUFFER_PAGES ((2560*1700)/1024) src/mainboard/google/link/i915io.h:#define FRAME_BUFFER_BYTES (FRAME_BUFFER_PAGES*4096)
Looking at the code of the Lenovo X60 with the resolution of 1024×768, instead of `FRAME_BUFFER_PAGES` = 768 it is the higher number 800. Otherwise there are graphical corruptions on the GRUB screen.
Furthermore Vladimir’s native graphics patch up for review in Gerrit [3] uses the code below [3].
/* Setup GTT. */ for (i = 0; i < 0x2000; i++) { outl((i << 2) | 1, piobase); outl(pphysbase + (i << 12) + 1, piobase + 4); }
This is equivalent to `setgtt(0, 8192, physbase, 4096)`. As the code should be not hardcoded to one resolution (for example the Lenovo T60 has a different one than the Lenovo X60), the code should be dynamic.
Unfortunately, I did not find anything in the datasheet [4][5] explaining, why this `setgtt()` is needed at all. I guess this was figured out by tracing the Video BIOS? Ron, Denis, Peter, Vladimir, it’d be great if you could explain that to me?
At least on Intel 945, the GTT is 256 KB big and is placed below TOLUD into the graphics stolen memory, which is currently hardcoded to 8 MB. So how should the pages be setup then?
Thanks,
Paul
[1] https://bugs.freedesktop.org/show_bug.cgi?id=79038 [2] http://review.coreboot.org/5927 [3] http://review.coreboot.org/#/c/5320/9/src/northbridge/intel/i945/gma.c [4] http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/mobil... [5] https://01.org/linuxgraphics/sites/default/files/documentation/965_g35_vol_1...
Paul, many good questions. The problem is I did all this over 2 years ago and the answers won't be as good. Vladimir and Peter will likely do better. And, to reiterate, the 'replay attack' code for Link is not a good example of how to start graphics. It Just Worked, but not well. The second iteration Furquan and I did on Falco and Pepper is really what we need to use.
So let's take it a bit at a time.
The GTT is a page table used by the graphics hardware to address memory. It translates graphics addresses to physical addresses.
Part of the hardcoding of size is because in some cases the chipset has a limited choice of sizes -- you set a bit, it takes a certain fixed size. 8 MiB is common to the older chipsets. At the same time, once the kernel starts, it's going to change the GTT anyway -- we're mainly doing GTT setup here to make it easy to paint the boot splash screens. That's it.
Note that GTT allows the graphics hardware to address anything in the low 4 GiB, and even more on some chipsets, where another set of address bits is stashed in the low 12 bits.
You can address the memory pointed to by the GTT. Use the address in one of the BARs -- IIRC, bar 0? I don't recall, maybe BAR2 -- as your address. Your references are then mapped back to physical memory addresses via the GTT mappings.
GSM is an odd duck. It's used on some chipsets for framebuffer compression IIRC -- corrections, anyone? So it's this weird piece of hidden away memory assigned to chipset for its own uses. It's not strongly tied to the GTT.
But framebuffer compression was full of bugs and was hard to get working and it's only recently been figured out, it seems. In older chipsets, e.g. 945, you set a bit and it takes a fixed size.
Note that GSM size is really unrelated except in the most gross sense to display resolution or depth, again, IIRC.
Also note that the GTT is just a way to map graphics chipset references to physical addresses, and again is not really related to resolution or depth, except in two ways: - if you don't create enough entries for the resolution, you have a problem - if you create too many, you might waste memory
The second problem is not as serious as the first. So we just grab a lot of memory and map it, because the coreboot mappings will be replaced anyway by the driver.
The GTT settings are fairly safe and I think you're on the wrong track in terms of your debugging, but I could be wrong.
You should not assume that it's NOT a Linux driver problem. The Linux drivers are not well tested against coreboot, and we know of cases where they worked against the BIOS, but only by mostly luck. Sound familiar :-) ? It's the problem we've always had.
I would question this change:
- PGETBL_CTL: 0x3ffc0001 + PGETBL_CTL: 0x3f800001
because it doesn't come with an explanation of what it's supposed to fix. I've found lots of incorrect BIOS settings that worked largely by accident with the Linux driver. I think you need to go a bit deeper. What did the one bit you changed do, and why?
But be careful about tying GSM and GTT together too much.
thanks
ron
Paul Menzel paulepanter@users.sourceforge.net once said:
since Linux 3.12+ graphics stolen memory seems to be used, which causes a regression with coreboot and native graphics init at least on the Intel 945 based Lenovo X60 [1].
This seems to be solved by properly setting register `PGETBL_CTL` [2], but the Linux kernel Intel graphics driver still reports a GTT (Graphics Translation Table) related error during start-up, but everything else seems to work.
[ 1.235640] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 1.236583] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 1.236583] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 1.236583] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 1.236583] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 1.236583] i915: render error detected, EIR: 0x00000010 [ 1.236583] i915: page table error [ 1.236583] i915: PGTBL_ER: 0x00000013 [ 1.236583] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking [ 1.236583] i915: render error detected, EIR: 0x00000010 [ 1.236583] i915: page table error [ 1.236583] i915: PGTBL_ER: 0x00000013
FWIW, I will occasionally get the same message and the apparent working behavior after resume on a MacBook2,1 using stock Apple EFI firmware. The MacBook uses the same chipset as the X60.
[49087.421096] [drm] GPU crash dump saved to /sys/class/drm/card0/error [49087.421096] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [49087.421097] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [49087.421097] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [49087.421098] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [49087.421763] i915: render error detected, EIR: 0x00000010 [49087.421763] i915: page table error [49087.421763] i915: PGTBL_ER: 0x00000001 [49087.421763] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking [49087.421763] i915: render error detected, EIR: 0x00000010 [49087.421763] i915: page table error [49087.421763] i915: PGTBL_ER: 0x00000001
Anthony
Does anybody want to have a go at a further interpration of the errors? That would help. Does this error mean a present bit was not set, or the address was wrong, or ...
ron
On Sun, Jun 8, 2014 at 4:40 PM, Anthony Martin ality@pbrane.org wrote:
Paul Menzel paulepanter@users.sourceforge.net once said:
since Linux 3.12+ graphics stolen memory seems to be used, which causes a regression with coreboot and native graphics init at least on the Intel 945 based Lenovo X60 [1].
This seems to be solved by properly setting register `PGETBL_CTL` [2], but the Linux kernel Intel graphics driver still reports a GTT (Graphics Translation Table) related error during start-up, but everything else seems to work.
[ 1.235640] [drm] GPU crash dump saved to
/sys/class/drm/card0/error
[ 1.236583] [drm] GPU hangs can indicate a bug anywhere in
the entire gfx stack, including userspace.
[ 1.236583] [drm] Please file a _new_ bug report on
bugs.freedesktop.org against DRI -> DRM/Intel
[ 1.236583] [drm] drm/i915 developers can then reassign to
the right component if it's not a kernel issue.
[ 1.236583] [drm] The gpu crash dump is required to analyze
gpu hangs, so please always attach it.
[ 1.236583] i915: render error detected, EIR: 0x00000010 [ 1.236583] i915: page table error [ 1.236583] i915: PGTBL_ER: 0x00000013 [ 1.236583] [drm:i915_report_and_clear_eir] *ERROR* EIR
stuck: 0x00000010, masking
[ 1.236583] i915: render error detected, EIR: 0x00000010 [ 1.236583] i915: page table error [ 1.236583] i915: PGTBL_ER: 0x00000013
FWIW, I will occasionally get the same message and the apparent working behavior after resume on a MacBook2,1 using stock Apple EFI firmware. The MacBook uses the same chipset as the X60.
[49087.421096] [drm] GPU crash dump saved to /sys/class/drm/card0/error [49087.421096] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [49087.421097] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [49087.421097] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [49087.421098] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [49087.421763] i915: render error detected, EIR: 0x00000010 [49087.421763] i915: page table error [49087.421763] i915: PGTBL_ER: 0x00000001 [49087.421763] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking [49087.421763] i915: render error detected, EIR: 0x00000010 [49087.421763] i915: page table error [49087.421763] i915: PGTBL_ER: 0x00000001
Anthony
-- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Am Sonntag, den 08.06.2014, 16:40 -0700 schrieb Anthony Martin:
Paul Menzel once said:
since Linux 3.12+ graphics stolen memory seems to be used, which causes a regression with coreboot and native graphics init at least on the Intel 945 based Lenovo X60 [1].
This seems to be solved by properly setting register `PGETBL_CTL` [2], but the Linux kernel Intel graphics driver still reports a GTT (Graphics Translation Table) related error during start-up, but everything else seems to work.
[ 1.235640] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 1.236583] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 1.236583] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 1.236583] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 1.236583] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 1.236583] i915: render error detected, EIR: 0x00000010 [ 1.236583] i915: page table error [ 1.236583] i915: PGTBL_ER: 0x00000013 [ 1.236583] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking [ 1.236583] i915: render error detected, EIR: 0x00000010 [ 1.236583] i915: page table error [ 1.236583] i915: PGTBL_ER: 0x00000013
FWIW, I will occasionally get the same message and the apparent working behavior after resume on a MacBook2,1 using stock Apple EFI firmware. The MacBook uses the same chipset as the X60.
[49087.421096] [drm] GPU crash dump saved to /sys/class/drm/card0/error [49087.421096] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [49087.421097] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [49087.421097] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [49087.421098] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [49087.421763] i915: render error detected, EIR: 0x00000010 [49087.421763] i915: page table error [49087.421763] i915: PGTBL_ER: 0x00000001 [49087.421763] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking [49087.421763] i915: render error detected, EIR: 0x00000010 [49087.421763] i915: page table error [49087.421763] i915: PGTBL_ER: 0x00000001
That is good to know! Did you do as the messages suggest and submit a bug report? Could you do so please and attach the GPU crash dump too?
Thanks,
Paul
Dear coreboot folks,
Am Montag, den 09.06.2014, 00:22 +0200 schrieb Paul Menzel:
since Linux 3.12+ graphics stolen memory seems to be used, which causes a regression with coreboot and native graphics init at least on the Intel 945 based Lenovo X60 [1].
This seems to be solved by properly setting register `PGETBL_CTL` [2], but the Linux kernel Intel graphics driver still reports a GTT (Graphics Translation Table) related error during start-up, but everything else seems to work.
[ 1.235640] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 1.236583] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 1.236583] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 1.236583] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 1.236583] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 1.236583] i915: render error detected, EIR: 0x00000010 [ 1.236583] i915: page table error [ 1.236583] i915: PGTBL_ER: 0x00000013 [ 1.236583] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking [ 1.236583] i915: render error detected, EIR: 0x00000010 [ 1.236583] i915: page table error [ 1.236583] i915: PGTBL_ER: 0x00000013
With #5927 patch set 7 [2] the error is slightly different. (It needs to be verified, if it is different during every run.)
[ 1.235596] i915: render error detected, EIR: 0x00000010 [ 1.235596] i915: page table error [ 1.235596] i915: PGTBL_ER: 0x00000012 [ 1.235596] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking [ 1.235596] i915: render error detected, EIR: 0x00000010 [ 1.235596] i915: page table error [ 1.235596] i915: PGTBL_ER: 0x00000012 [ 1.310633] [drm:intel_modeset_init], 2 display pipes available.
The intel-gpu-tools [6] include the program `intel-error-decode`, which give the following.
$ ./intel_error_decode /tmp/5927_7/error Time: 1402269737 s 277725 us Kernel: 3.14.4-gnuowen PCI ID: 0x27a2 Detected GEN3 chipset EIR: 0x00000010 IER: 0x00028053 PGTBL_ER: 0x00000012 Display A: Invalid GTT PTE Host Invalid PTE data […]
That register is not documented in the public Intel 945 datasheet [4].
Using the Intel 965 datasheet [5], page 206, there are the following descriptions.
Bit 1: Valid PTE references illegal memory, such as PAM, SMM or TOM Bit 4: Invalid GTT Entry during Display A Fetch
[…]
Thanks,
Paul
[1] https://bugs.freedesktop.org/show_bug.cgi?id=79038 [2] http://review.coreboot.org/5927 [3] http://review.coreboot.org/#/c/5320/9/src/northbridge/intel/i945/gma.c [4] http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/mobil... [5] https://01.org/linuxgraphics/sites/default/files/documentation/965_g35_vol_1...
It's painful and awful to do, but when you get that error, I suggest you mod the driver to hexdump the ENTIRE gtt. You need to see what's going on there.
It's possible that somehow the gtt and gsm are living in the same place and the chipset is corrupting its own gtts.
ron