I'm using LinuxBIOS on my Tyan s2892. I have a device that maps a lot of the memory space, but I'm struggling trying to get the Opteron to read and write to my device in larger blocks. I have set the variable MTRRs in the device driver to writeback (witnessed by /proc/mtrr), but I still get 64-bit accesses instead of 64-byte (cache line).
Any suggestions?
Thanks,
Myles
On 6/13/07, Myles Watson myles@pel.cs.byu.edu wrote:
I'm using LinuxBIOS on my Tyan s2892. I have a device that maps a lot of the memory space, but I'm struggling trying to get the Opteron to read and write to my device in larger blocks. I have set the variable MTRRs in the device driver to writeback (witnessed by /proc/mtrr), but I still get 64-bit accesses instead of 64-byte (cache line).
wirte-back or write-combining?
YH
On 6/13/07, Myles Watson myles@pel.cs.byu.edu wrote:
I'm using LinuxBIOS on my Tyan s2892. I have a device that maps a lot of the memory space, but I'm struggling trying to get the Opteron to read and write to my device in larger blocks. I have set the variable MTRRs in the device driver to writeback (witnessed by /proc/mtrr), but I
still get 64-bit accesses instead of 64-byte (cache line).
wirte-back or write-combining?
YH
I did write-back, because I would like it to be treated as much like DRAM as possible.
Myles
Myles Watson wrote:
On 6/13/07, Myles Watson myles@pel.cs.byu.edu wrote:
I'm using LinuxBIOS on my Tyan s2892. I have a device that maps a lot of the memory space, but I'm struggling trying to get the Opteron to read and write to my device in larger blocks. I have set the variable MTRRs in the device driver to writeback (witnessed by /proc/mtrr), but I
still get 64-bit accesses instead of 64-byte (cache line).
wirte-back or write-combining?
YH
I did write-back, because I would like it to be treated as much like DRAM as possible.
Myles
Hi Myles, I am not an Opteron expert but here are a few items you might want to check.
I assume that this is a PCI/E/X device. Is the device memory BAR and bridge memory set as prefetchable? Check that you don't have any overlapping mtrrs. I think that they resolve to the most restrictive setting Check the PAT setting for that memory. The memory page will resolve to the most restrictive setting. (Note I am not sure if Linux uses PAT) See section 7.6 of the System Programmers Guide Volume 2. You can get here: http://www.amd.com/gb-uk/Processors/TechnicalResources/0,,30_182_739_7044,00...
You might get some performance by caching the device memory space but I don't think that it is the best use of the cache. If you are moving a lot memory between your device and system memory it is best that the driver DMA it in instead of reading it in with the CPU.
If you figure out the problem please let us know.
Thanks, Marc
Myles Watson wrote:
On 6/13/07, Myles Watson myles@pel.cs.byu.edu wrote:
I'm using LinuxBIOS on my Tyan s2892. I have a device that maps a lot of the memory space, but I'm struggling trying to get the Opteron to read and write to my device in larger blocks. I have set the variable MTRRs in the device driver to writeback (witnessed by /proc/mtrr), but
I
still get 64-bit accesses instead of 64-byte (cache line).
wirte-back or write-combining?
YH
I did write-back, because I would like it to be treated as much like DRAM as possible.
Myles
Hi Myles, I am not an Opteron expert but here are a few items you might want to check.
I assume that this is a PCI/E/X device. Is the device memory BAR and bridge memory set as prefetchable?
It is HyperTransport, but pretty much the same. I set it as prefetchable. The reads actually work, it's the writes that come through 8-bytes at a time.
Check that you don't have any overlapping mtrrs. I think that they resolve to the most restrictive setting Check the PAT setting for that memory. The memory page will resolve to the most restrictive setting. (Note I am not sure if Linux uses PAT) See section 7.6 of the System Programmers Guide Volume 2. You can get here:
http://www.amd.com/gb-uk/Processors/TechnicalResources/0,,30_182_739_7044,00 .html
I'll look.
You might get some performance by caching the device memory space but I don't think that it is the best use of the cache. If you are moving a lot memory between your device and system memory it is best that the driver DMA it in instead of reading it in with the CPU.
I'm using the device to explore paging activity, so I'm not after speed as much as I am trying to make the device seem as much like RAM as possible
Thanks for your help, Myles
On 06/13/2007 09:39 AM, Myles Watson wrote:
I’m using LinuxBIOS on my Tyan s2892. I have a device that maps a lot of the memory space, but I’m struggling trying to get the Opteron to read and write to my device in larger blocks. I have set the variable MTRRs in the device driver to writeback (witnessed by /proc/mtrr), but I still get 64-bit accesses instead of 64-byte (cache line).
Some time ago I officially asked an AMD's rep about this. Someone from AMD unofficially said that MemIO can be only Write-Combined. And I gave up with caching.
Below is what I asked. Maybe you can spot what I missed. Please let me know if you succeed.
Roman -------------------------------------------------------------------- I have a difficulty related to cacheability of Memory-Mapped I/O memory in Opteron. Can you please assist me?
I have an Opteron-based system, where the Opteron and an FPGA are connected using a HyperTransport link. As a target of HT traffic, the FPGA responses to certain range of physical addresses. I am trying to configure the CPU and the North Bridge to treat the FPGA as Writeback memory. I cannot do this.
I've been able to make the FPGA Uncacheable, Write-Combining, Write-Protect and Writethrough. When I try to make it Writeback it behaves as if it were Writethrough.
Is it possible to make the FPGA Writeback? If yes, what should I change?
I have observability of the HT traffic from the FPGA point of view, so I know what is written and what is requested by the CPU.
Here is how I program the Opteron:
The processor is running in the Long Mode with paging enabled.
A pair of NB PCI Function 1, Memory-Mapped I/O Base and Limit registers is programmed with the FPGA physical address range, proper link and node IDs, posted.
A pair of the variable MTRR Base and Mask (MSR 0x200-0x20f) registers is programmed with the FPGA physical address range, valid bit and the desired caching method. Another pair of MTRR registers describes the DRAM, all others are disabled.
The PAT register (MSR 0x277) is 0x0606060606060606ull.
The Page Table Entry for the virtual address mapped to the FPGA has PAT, PCD and PWT bits cleared.
Bit CD in the CR0 register is cleared.
On 06/13/2007 09:39 AM, Myles Watson wrote:
I'm using LinuxBIOS on my Tyan s2892. I have a device that maps a lot of the memory space, but I'm struggling trying to get the Opteron to read and write to my device in larger blocks. I have set the variable MTRRs in the device driver to writeback (witnessed by /proc/mtrr), but I still get 64-bit accesses instead of 64-byte (cache line).
Some time ago I officially asked an AMD's rep about this. Someone from AMD unofficially said that MemIO can be only Write-Combined. And I gave up with caching.
I'm having the same trouble. Even though the MTRRs are set for write-back, it acts as if it were write-through. I'll keep looking, because it's important to my project.
Thanks for your help and suggestions, Myles
Roman Kononov wrote:
On 06/13/2007 09:39 AM, Myles Watson wrote:
I’m using LinuxBIOS on my Tyan s2892. I have a device that maps a lot of the memory space, but I’m struggling trying to get the Opteron to read and write to my device in larger blocks. I have set the variable MTRRs in the device driver to writeback (witnessed by /proc/mtrr), but I still get 64-bit accesses instead of 64-byte (cache line).
Some time ago I officially asked an AMD's rep about this. Someone from AMD unofficially said that MemIO can be only Write-Combined. And I gave up with caching.
Below is what I asked. Maybe you can spot what I missed. Please let me know if you succeed.
Roman
I have a difficulty related to cacheability of Memory-Mapped I/O memory in Opteron. Can you please assist me?
I have an Opteron-based system, where the Opteron and an FPGA are connected using a HyperTransport link. As a target of HT traffic, the FPGA responses to certain range of physical addresses. I am trying to configure the CPU and the North Bridge to treat the FPGA as Writeback memory. I cannot do this.
I've been able to make the FPGA Uncacheable, Write-Combining, Write-Protect and Writethrough. When I try to make it Writeback it behaves as if it were Writethrough.
Is it possible to make the FPGA Writeback? If yes, what should I change?
I have observability of the HT traffic from the FPGA point of view, so I know what is written and what is requested by the CPU.
Here is how I program the Opteron:
The processor is running in the Long Mode with paging enabled.
A pair of NB PCI Function 1, Memory-Mapped I/O Base and Limit registers is programmed with the FPGA physical address range, proper link and node IDs, posted.
A pair of the variable MTRR Base and Mask (MSR 0x200-0x20f) registers is programmed with the FPGA physical address range, valid bit and the desired caching method. Another pair of MTRR registers describes the DRAM, all others are disabled.
The PAT register (MSR 0x277) is 0x0606060606060606ull.
The Page Table Entry for the virtual address mapped to the FPGA has PAT, PCD and PWT bits cleared.
Bit CD in the CR0 register is cleared.
I am not an expert but these are my thoughts....
Now that I think about it, it makes sense that you can't set writeback to noncoherent (nonsystem) memory space. What if another device wants to write to that memory. There is no way for the cache to snoop that it was written. You would need a coherent HT link to to your FPGA to get the cache snoop messaging.
Marc
-----Original Message----- From: linuxbios-bounces@linuxbios.org [mailto:linuxbios- bounces@linuxbios.org] On Behalf Of Marc Jones Sent: Thursday, June 14, 2007 2:39 PM To: Roman Kononov Cc: myles@mouselemur.cs.byu.edu; linuxbios@linuxbios.org Subject: Re: [LinuxBIOS] Opteron caching of device memory
Roman Kononov wrote:
On 06/13/2007 09:39 AM, Myles Watson wrote:
I'm using LinuxBIOS on my Tyan s2892. I have a device that maps a lot of the memory space, but I'm struggling trying to get the Opteron to read and write to my device in larger blocks. I have set the variable MTRRs in the device driver to writeback (witnessed by /proc/mtrr), but
I
still get 64-bit accesses instead of 64-byte (cache line).
Some time ago I officially asked an AMD's rep about this. Someone from
AMD
unofficially said that MemIO can be only Write-Combined. And I gave up
with
caching.
Below is what I asked. Maybe you can spot what I missed. Please let me
know
if you succeed.
Roman
I have a difficulty related to cacheability of Memory-Mapped I/O memory
in
Opteron. Can you please assist me?
I have an Opteron-based system, where the Opteron and an FPGA are
connected
using a HyperTransport link. As a target of HT traffic, the FPGA
responses
to certain range of physical addresses. I am trying to configure the CPU
and
the North Bridge to treat the FPGA as Writeback memory. I cannot do
this.
I've been able to make the FPGA Uncacheable, Write-Combining, Write-
Protect
and Writethrough. When I try to make it Writeback it behaves as if it
were
Writethrough.
Is it possible to make the FPGA Writeback? If yes, what should I change?
I have observability of the HT traffic from the FPGA point of view, so I know what is written and what is requested by the CPU.
Here is how I program the Opteron:
The processor is running in the Long Mode with paging enabled.
A pair of NB PCI Function 1, Memory-Mapped I/O Base and Limit registers
is
programmed with the FPGA physical address range, proper link and node
IDs,
posted.
A pair of the variable MTRR Base and Mask (MSR 0x200-0x20f) registers is programmed with the FPGA physical address range, valid bit and the
desired
caching method. Another pair of MTRR registers describes the DRAM, all others are disabled.
The PAT register (MSR 0x277) is 0x0606060606060606ull.
The Page Table Entry for the virtual address mapped to the FPGA has PAT,
PCD
and PWT bits cleared.
Bit CD in the CR0 register is cleared.
I am not an expert but these are my thoughts....
Now that I think about it, it makes sense that you can't set writeback to noncoherent (nonsystem) memory space. What if another device wants to write to that memory. There is no way for the cache to snoop that it was written. You would need a coherent HT link to to your FPGA to get the cache snoop messaging.
Marc
I think that the FPGA should be able to set the coherent bit and cause snooping, even if it is not on the coherent bus, because it targets coherent memory.
An example is DMA: 1. The processor writes data to DRAM (some of it may/will still be cached) 2. The processor writes to the DMA controller (say a SATA controller) 3. The DMA transfers the data to the disk drive
If there's no snooping, the DMA controller reads stale data from the DRAM.
Feel free to point out shortcomings in my example. I need to learn more about it.
Myles
Myles Watson wrote:
-----Original Message----- From: linuxbios-bounces@linuxbios.org [mailto:linuxbios- bounces@linuxbios.org] On Behalf Of Marc Jones Sent: Thursday, June 14, 2007 2:39 PM To: Roman Kononov Cc: myles@mouselemur.cs.byu.edu; linuxbios@linuxbios.org Subject: Re: [LinuxBIOS] Opteron caching of device memory
Roman Kononov wrote:
On 06/13/2007 09:39 AM, Myles Watson wrote:
I'm using LinuxBIOS on my Tyan s2892. I have a device that maps a lot of the memory space, but I'm struggling trying to get the Opteron to read and write to my device in larger blocks. I have set the variable MTRRs in the device driver to writeback (witnessed by /proc/mtrr), but
I
still get 64-bit accesses instead of 64-byte (cache line).
Some time ago I officially asked an AMD's rep about this. Someone from
AMD
unofficially said that MemIO can be only Write-Combined. And I gave up
with
caching.
Below is what I asked. Maybe you can spot what I missed. Please let me
know
if you succeed.
Roman
I have a difficulty related to cacheability of Memory-Mapped I/O memory
in
Opteron. Can you please assist me?
I have an Opteron-based system, where the Opteron and an FPGA are
connected
using a HyperTransport link. As a target of HT traffic, the FPGA
responses
to certain range of physical addresses. I am trying to configure the CPU
and
the North Bridge to treat the FPGA as Writeback memory. I cannot do
this.
I've been able to make the FPGA Uncacheable, Write-Combining, Write-
Protect
and Writethrough. When I try to make it Writeback it behaves as if it
were
Writethrough.
Is it possible to make the FPGA Writeback? If yes, what should I change?
I have observability of the HT traffic from the FPGA point of view, so I know what is written and what is requested by the CPU.
Here is how I program the Opteron:
The processor is running in the Long Mode with paging enabled.
A pair of NB PCI Function 1, Memory-Mapped I/O Base and Limit registers
is
programmed with the FPGA physical address range, proper link and node
IDs,
posted.
A pair of the variable MTRR Base and Mask (MSR 0x200-0x20f) registers is programmed with the FPGA physical address range, valid bit and the
desired
caching method. Another pair of MTRR registers describes the DRAM, all others are disabled.
The PAT register (MSR 0x277) is 0x0606060606060606ull.
The Page Table Entry for the virtual address mapped to the FPGA has PAT,
PCD
and PWT bits cleared.
Bit CD in the CR0 register is cleared.
I am not an expert but these are my thoughts....
Now that I think about it, it makes sense that you can't set writeback to noncoherent (nonsystem) memory space. What if another device wants to write to that memory. There is no way for the cache to snoop that it was written. You would need a coherent HT link to to your FPGA to get the cache snoop messaging.
Marc
I think that the FPGA should be able to set the coherent bit and cause snooping, even if it is not on the coherent bus, because it targets coherent memory.
An example is DMA:
- The processor writes data to DRAM (some of it may/will still be cached)
- The processor writes to the DMA controller (say a SATA controller)
- The DMA transfers the data to the disk drive
If there's no snooping, the DMA controller reads stale data from the DRAM.
Feel free to point out shortcomings in my example. I need to learn more about it.
Myles
I am sorry, I don't have the expertise to answer your question. I recommend that you try the Hyper Transport Consortium and if you still can't get an answer let me know. I will try to find the AMD Torrenza guys to get us an answer.
http://www.hypertransport.org/ http://enterprise.amd.com/us-en/AMD-Business/Technology-Home/Torrenza.aspx
Marc
On 6/14/07, Marc Jones Marc.Jones@amd.com wrote:
Now that I think about it, it makes sense that you can't set writeback to noncoherent (nonsystem) memory space. What if another device wants to write to that memory. There is no way for the cache to snoop that it was written. You would need a coherent HT link to to your FPGA to get the cache snoop messaging.
Also, writeback would be a disaster for drivers. The expectation for a driver is that when the code sets bist in an MMIO space, that bit gets set. That's a writethrough semantic.
thanks
ron
On 06/14/2007 03:59 PM, ron minnich wrote:
On 6/14/07, Marc Jones Marc.Jones@amd.com wrote:
Now that I think about it, it makes sense that you can't set writeback to noncoherent (nonsystem) memory space. What if another device wants to write to that memory. There is no way for the cache to snoop that it was written. You would need a coherent HT link to to your FPGA to get the cache snoop messaging.
Also, writeback would be a disaster for drivers. The expectation for a driver is that when the code sets bist in an MMIO space, that bit gets set. That's a writethrough semantic.
I know for sure that the PowerPC architecture has no problems with caching anything regardless whether it is SMP or not. Cache coherency is somewhat "religiuos" for x86.
If my driver and only my driver accesses a device why cannot the device be cached? If needed, I can use cflush instructions to synchronize the cache and device.
Regards, Roman