Sorry for the offtopic, but I see from time to time technical people from amd.com writing to this list, so I decided to try my luck. I would be very grateful for any help with the following issues. Perhaps, I could be referred to the proper technical contacts.
FreeBSD starting with upcoming 8.0 version has transparent support for large pages (superpages in FreeBSD terms). This means that an eligible range of 4KB pages can be promoted to a 4MB page at an opportune moment. The large page can also be broken down into normal pages when needed, of course. Large pages can also be explicitly allocated, but that is beside the point.
We seem to have a problem, that is perhaps caused by lack of strictness in our code, that looks to be caused by the mentioned above superpages feature. But the problem manifests itself only on AMD family 10h processors. To be precise, we have reports that family Fh is not affected, all problem reports are for family 10h and we have no positive or negative reports for family 11h. Another mandatory condition for the problem to manifest itself is having machine check enabled by either BIOS or OS. Also, the problem is reported only for long mode.
So, the actual problem manifestation is a machine check report about parity error in DC TLB L1. All reporters have confirmed that they don't experience any problems if the superpages feature is turned off. So it seems likely that this machine check report is not indicative of a hardware fault.
It looks that the way our code currently works it is possible that we could get into a situation where two TLB entries would exist for the same linear-to-physical translation. One through a large page and another through a normal page. Most likely both should be correct (point to the same physical location). Is it possible that such a situation could lead the integrity checking logic to believe that there is a parity error in TLB? I've searched though the errata for family 10h processors but couldn't find one that would match.
Examples of processors affected by the problem (as reported by FreeBSD kernel): 1). CPU: AMD Athlon(tm) II X2 250 Processor (3013.75-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0x100f62 Stepping = 2 Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> Features2=0x802009<SSE3,MON,CX16,POPCNT> AMD Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!> AMD Features2=0x37ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT> 2). CPU: Quad-Core AMD Opteron(tm) Processor 2352 (2100.09-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0x100f23 Stepping = 3 Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> Features2=0x802009<SSE3,MON,CX16,POPCNT> AMD Features=0xee400800<SYSCALL,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!> AMD Features2=0x7ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS>
This is how FreeBSD MCA code reported the machine check: MCA: CPU 5 UNCOR PCC OVER DTLB L1 error MCA: Address 0x80e5c8000
My guess of possible FreeBSD code issue: 4K mappings are not flushed when corresponding PDE is updated from pointing to PT to pointing to a 2M page.
Thank you.