17 Feb 15:28 2011 Jonathan Kollasch wrote:
]I have it (mostly) working now, just have to get it polished up a bit. ] ] Jonathan Kollasch
Hello Jonathan,
Right now I am trying to debug the same AHCI problem. The Host To Device FIS length change also gets me past the first polling loop in ahci_command, thanks for the tip. Do you have any fixes beyond that one I should try?
Thanks, Scott
On Fri, Apr 22, 2011 at 02:21:08AM -0500, Scott Duplichan wrote:
17 Feb 15:28 2011 Jonathan Kollasch wrote:
]I have it (mostly) working now, just have to get it polished up a bit. ] ] Jonathan Kollasch
Hello Jonathan,
Right now I am trying to debug the same AHCI problem. The Host To Device FIS length change also gets me past the first polling loop in ahci_command, thanks for the tip. Do you have any fixes beyond that one I should try?
Yeah, I just didn't like them well enough to post a patch.
Attached is what I have. Last I tried it a few months ago it mostly worked for ATA disks but not ATAPI devices. IIRC the key changes are in the port probe and reset functions.
Jonathan Kollasch
On Fri, Apr 22, 2011 at 02:21:08AM -0500, 2011 Jonathan Kollasch wrote:
]> ]I have it (mostly) working now, just have to get it polished up a bit. ]> ] ]> ] Jonathan Kollasch ]> ]> Hello Jonathan, ]> ]> Right now I am trying to debug the same AHCI problem. The Host To Device ]> FIS length change also gets me past the first polling loop in ahci_command, ]> thanks for the tip. Do you have any fixes beyond that one I should try? ] ]Yeah, I just didn't like them well enough to post a patch. ] ]Attached is what I have. Last I tried it a few months ago it mostly ]worked for ATA disks but not ATAPI devices. IIRC the key changes are ]in the port probe and reset functions. ] ] Jonathan Kollasch
Hello Jonathan,
Thank you so much. I will spend some time and try to add to your work.
Thanks, Scott
On Friday, April 22, 2011 05:19 AM Jonathan A. Kollasch wrote:
]> thanks for the tip. Do you have any fixes beyond that one I should try? ] ]Yeah, I just didn't like them well enough to post a patch. ] ]Attached is what I have. Last I tried it a few months ago it mostly ]worked for ATA disks but not ATAPI devices. IIRC the key changes are ]in the port probe and reset functions. ] ] Jonathan Kollasch
Hello Jonathan,
Thanks. I made some progress. Though there are some quirks, the attached patch gets AHCI working well enough on my AMD Fusion board to install Windows 7 from DVD. I installed it on both a regular SATA drive and on a SATA SSD drive. There are at least 2 remaining problems:
1) When booting a DOS drive, a disk read error occurs at some point after autoexec executes. 2) DOS booting is slower by 100-200 ms. While that isn't much in absolute terms, it is a big percentage increase from the 642 ms I can get using the IDE interface.
Windows 7 setup loads from DVD noticeable faster when AHCI is used. Windows 7 boot with AHCI and an SSD disk is impressive. Using AHCI and completely hiding the IDE device keeps Windows from wasting time with uninstalled drives while booting. Starting from power off with the Windows 7 SSD drive, the desktop background appears after 15 seconds and the desktop is complete after around 20 seconds.
Thanks, Scott
--- seabios-0.6.2-original\src\ahci.c Mon Feb 28 21:10:57 2011 +++ seabios-0.6.2\src\ahci.c Sun Apr 24 01:12:20 2011 @@ -105,7 +105,7 @@ static int ahci_command(struct ahci_port_s *port, int iswrite, int isatapi, void *buffer, u32 bsize) { - u32 val, status, success, flags; + u32 val, status, flags, intbits, error; struct ahci_ctrl_s *ctrl = GET_GLOBAL(port->ctrl); struct ahci_cmd_s *cmd = GET_GLOBAL(port->cmd); struct ahci_fis_s *fis = GET_GLOBAL(port->fis); @@ -118,19 +118,24 @@ SET_FLATPTR(cmd->prdt[0].baseu, 0); SET_FLATPTR(cmd->prdt[0].flags, bsize-1);
+ // clear interrupt status + val = ahci_port_readl(ctrl, pnr, PORT_IRQ_STAT); + ahci_port_writel(ctrl, pnr, PORT_IRQ_STAT, val); + + // toggle the start control val = ahci_port_readl(ctrl, pnr, PORT_CMD); + ahci_port_writel(ctrl, pnr, PORT_CMD, val & ~(PORT_CMD_START | PORT_CMD_FIS_RX)); ahci_port_writel(ctrl, pnr, PORT_CMD, val | PORT_CMD_START);
if (ahci_port_readl(ctrl, pnr, PORT_CMD_ISSUE)) return -1;
flags = ((1 << 16) | /* one prd entry */ - (1 << 10) | /* clear busy on ok */ (iswrite ? (1 << 6) : 0) | (isatapi ? (1 << 5) : 0) | - (4 << 0)); /* fis length (dwords) */ + (5 << 0)); /* fis length (dwords) */ SET_FLATPTR(list[0].flags, flags); - SET_FLATPTR(list[0].bytes, bsize); + SET_FLATPTR(list[0].bytes, 0); SET_FLATPTR(list[0].base, ((u32)(cmd))); SET_FLATPTR(list[0].baseu, 0);
@@ -138,19 +143,39 @@ SET_FLATPTR(fis->rfis[2], 0); ahci_port_writel(ctrl, pnr, PORT_SCR_ACT, 1); ahci_port_writel(ctrl, pnr, PORT_CMD_ISSUE, 1); - while (ahci_port_readl(ctrl, pnr, PORT_CMD_ISSUE)) { - yield(); - } - while ((status = GET_FLATPTR(fis->rfis[2])) == 0) { + while (1) { + intbits = ahci_port_readl(ctrl, pnr, PORT_IRQ_STAT) & 0x3f; + if (intbits) break; yield(); } - - success = (0x00 == (status & (ATA_CB_STAT_BSY | ATA_CB_STAT_DF | - ATA_CB_STAT_DRQ | ATA_CB_STAT_ERR)) && - ATA_CB_STAT_RDY == (status & (ATA_CB_STAT_RDY))); - dprintf(2, "AHCI/%d: ... finished, status 0x%x, %s\n", pnr, - status, success ? "OK" : "ERROR"); - return success ? 0 : -1; + status = 0xFFFF; + error = -2; + if (intbits == 0x01) { // Device to Host Register FIS Interrupt (DHRS) + status = GET_FLATPTR(fis->rfis[2]); + error = GET_FLATPTR(fis->rfis[3]); + } + if (intbits == 0x02) { // PIO Setup FIS Interrupt (PSS) + status = GET_FLATPTR(fis->psfis[2]); + error = GET_FLATPTR(fis->psfis[3]); + } + if (intbits == 0x04) { // DMA Setup FIS Interrupt (DSS) + status = GET_FLATPTR(fis->dsfis[2]); + error = GET_FLATPTR(fis->dsfis[3]); + } + if (intbits == 0x08) { // Set Device Bits Interrupt (SDBS) + status = GET_FLATPTR(fis->sdbfis[2]); + error = GET_FLATPTR(fis->sdbfis[3]); + } + if (intbits == 0x10) { // Unknown FIS Interrupt (UFS) + status = GET_FLATPTR(fis->ufis[2]); + error = GET_FLATPTR(fis->ufis[3]); + } + + if (!error) + dprintf(2, "AHCI/%d: ... finished, status 0x%02x OK\n", pnr, status); + else + dprintf(2, "AHCI/%d: ... finished, status 0x%02x ERROR %02x\n", pnr, status, error); + return error; }
#define CDROM_CDB_SIZE 12 @@ -172,7 +197,7 @@ } rc = ahci_command(port, 0, 1, op->buf_fl, op->count * blocksize); - if (rc < 0) + if (rc) return DISK_RET_EBADTRACK; return DISK_RET_SUCCESS; } @@ -191,7 +216,7 @@ op->count * DISK_SECTOR_SIZE); dprintf(2, "ahci disk %s, lba %6x, count %3x, buf %p, rc %d\n", iswrite ? "write" : "read", (u32)op->lba, op->count, op->buf_fl, rc); - if (rc < 0) + if (rc) return DISK_RET_EBADTRACK; return DISK_RET_SUCCESS; } @@ -281,22 +306,10 @@ static int ahci_port_probe(struct ahci_ctrl_s *ctrl, u32 pnr) { - u32 val, count = 0; - - val = ahci_port_readl(ctrl, pnr, PORT_TFDATA); - while (val & ((1 << 7) /* BSY */ | - (1 << 3) /* DRQ */)) { - ndelay(500); - val = ahci_port_readl(ctrl, pnr, PORT_TFDATA); - count++; - if (count >= AHCI_MAX_RETRIES) - return -1; - } - - val = ahci_port_readl(ctrl, pnr, PORT_SCR_STAT); - if ((val & 0x07) != 0x03) - return -1; - return 0; + u32 val = ahci_port_readl(ctrl, pnr, PORT_SCR_STAT) & 7; + if (val == 3) // Device presence detected and Phy communication established + return 0; + return -1; }
#define MAXMODEL 40 @@ -340,7 +353,7 @@ port->atapi = 0; sata_prep_simple(&port->cmd->fis, ATA_CMD_IDENTIFY_DEVICE); rc = ahci_command(port, 0, 0, buffer, sizeof(buffer)); - if (rc < 0) + if (rc) goto err; }
Scott Duplichan wrote:
]1) When booting a DOS drive, a disk read error occurs at some point ]after autoexec executes.
The revised patch (attached) overcomes this problem. It turns out in the latter stage of booting, DOS makes a couple of INT13 read requests with a buffer that is not word aligned. AHCI only supports word aligned buffers. This was causing the data to be shifted by one byte for these reads. The patch adds unaligned support for reads only, which is good enough for the DOS boot problem. The method of this patch actually writes one byte beyond the end of the caller's buffer. But the only OS seen to do this is DOS, so it may be good enough this way.
]2) DOS booting is slower by 100-200 ms. While that isn't much in ]absolute terms, it is a big percentage increase from the 642 ms I ]can get using the IDE interface.
Now AHCI DOS boot is slower than IDE DOS boot by only 40 ms, so I am satisfied and have switched my project over to AHCI mode.
Thanks, Scott
Scott Duplichan wrote:
]1) When booting a DOS drive, a disk read error occurs at some point ]after autoexec executes.
The revised patch (attached) overcomes this problem. It turns out in the latter stage of booting, DOS makes a couple of INT13 read requests with a buffer that is not word aligned. AHCI only supports word aligned buffers. This was causing the data to be shifted by one byte for these reads. The patch adds unaligned support for reads only, which is good enough for the DOS boot problem. The method of this patch actually writes one byte beyond the end of the caller's buffer. But the only OS seen to do this is DOS, so it may be good enough this way.
Still it's not nice to write outside the callers buffer. Another OS might call same function and SeaBIOS would end up corrupting some variable. Ungood. I guess memmove() is the only choice?
//Peter
Peter Stuge wrote:
]Still it's not nice to write outside the callers buffer. Another OS ]might call same function and SeaBIOS would end up corrupting some ]variable. Ungood. I guess memmove() is the only choice?
]//Peter
I had a couple of ideas for a more sound solution. Debugging them is a challenge for me because I am new to seabios development and its method for writing code that works correctly in both 16-bit and 32-bit mode. Anyway, the first method I tried was to allocate a temporary I/O buffer that has the proper alignment. While that creates overhead, it would only be invoked in the rare case of unaligned request. But then there is the possibility that the allocation function will not be able to satisfy the request. Another possibility is splitting the request. The caller's buffer could handle the bigger part, and a stack buffer could be used for the remaining part. Yet another possibility is to backup the byte that gets overwritten by the current patch method and restore it afterwards. While that is an unreliable method for paged code, it might work here. The only danger I can think of is if the caller's buffer is at the end of a physical DRAM range.
Thanks, Scott
Scott Duplichan wrote:
Another possibility is splitting the request. The caller's buffer could handle the bigger part, and a stack buffer could be used for the remaining part.
I think this idea is by far the best!
//Peter
Peter Stuge wrote:
Scott Duplichan wrote:
Another possibility is splitting the request. The caller's buffer could handle the bigger part, and a stack buffer could be used for the remaining part.
] I think this idea is by far the best!
I started coding this and ran into a problem. According to Kevin the stack is too small for a one sector buffer. So I was going to use an aligned subset of the caller's buffer to read the last sector and save its last word in a stack variable. But if a single sector is requested and the buffer is unaligned, aligning it results in a buffer too small to read a complete sector.
Thanks, Scott
On Mon, Apr 25, 2011 at 05:31:36PM -0500, Scott Duplichan wrote:
Peter Stuge wrote:
]Still it's not nice to write outside the callers buffer. Another OS ]might call same function and SeaBIOS would end up corrupting some ]variable. Ungood. I guess memmove() is the only choice?
]//Peter
I had a couple of ideas for a more sound solution.
This came up for DMA on IDE too. What I did there was just fallback to PIO if the buffer was not aligned.
Out of curiosity, can you see what happens if you return DISK_RET_EBOUNDARY in the unaligned case?
Debugging them is a challenge for me because I am new to seabios development and its method for writing code that works correctly in both 16-bit and 32-bit mode. Anyway, the first method I tried was to allocate a temporary I/O buffer that has the proper alignment. While that creates overhead, it would only be invoked in the rare case of unaligned request. But then there is the possibility that the allocation function will not be able to satisfy the request.
It's not currently possible to dynamically allocate memory during runtime. However, it is possible to allocate a buffer at init and keep it around for later. How big of a buffer would you need?
Since several devices require DMA buffers, I had thought of adding code for a low-memory dma pool. I never got around to it though.
Another possibility is splitting the request. The caller's buffer could handle the bigger part, and a stack buffer could be used for the remaining part.
The stack is only 512 bytes (for disk ops) so you don't have that much space.
Yet another possibility is to backup the byte that gets overwritten by the current patch method and restore it afterwards. While that is an unreliable method for paged code, it might work here. The only danger I can think of is if the caller's buffer is at the end of a physical DRAM range.
That seems fragile.
Thanks, -Kevin
Kevin O'Connor wrote:
]> Peter Stuge wrote: ]> ]> ]Still it's not nice to write outside the callers buffer. Another OS ]> ]might call same function and SeaBIOS would end up corrupting some ]> ]variable. Ungood. I guess memmove() is the only choice? ]> ]> ]//Peter ]> ]> I had a couple of ideas for a more sound solution. ] ]This came up for DMA on IDE too. What I did there was just fallback ]to PIO if the buffer was not aligned. ] ]Out of curiosity, can you see what happens if you return ]DISK_RET_EBOUNDARY in the unaligned case?
DOS tries the same request 10 times then ignores the error and continues.
]It's not currently possible to dynamically allocate memory during ]runtime. However, it is possible to allocate a buffer at init and ]keep it around for later. How big of a buffer would you need?
I think a single aligned 512 byte buffer would do. The code could read a sector at a time and memcpy from the seabios aligned buffer into the caller's unaligned buffer.
]Thanks, ]-Kevin
On Tue, Apr 26, 2011 at 09:15:04PM -0500, Scott Duplichan wrote:
Kevin O'Connor wrote: ]Out of curiosity, can you see what happens if you return ]DISK_RET_EBOUNDARY in the unaligned case?
DOS tries the same request 10 times then ignores the error and continues.
Thanks for testing.
]It's not currently possible to dynamically allocate memory during ]runtime. However, it is possible to allocate a buffer at init and ]keep it around for later. How big of a buffer would you need?
I think a single aligned 512 byte buffer would do. The code could read a sector at a time and memcpy from the seabios aligned buffer into the caller's unaligned buffer.
Yeah. One can call malloc_low() during the init phase to allocate this buffer. Ideally, though, there would be some way to share buffer space with the other users (eg, cdemu, ide dma, usb).
-Kevin
Kevin O'Connor wrote:
]Yeah. One can call malloc_low() during the init phase to allocate ]this buffer. Ideally, though, there would be some way to share buffer ]space with the other users (eg, cdemu, ide dma, usb). ] ]-Kevin
Hello Kevin,
The attached patch used a malloc_low disk sector buffer when the caller's buffer is not aligned. The read handling is tested with DOS and the write handling is untested.
There is another problem. The same DOS image as eltorito CD-ROM causes the same problem, but with atapi. So maybe a shared 2048 byte buffer is needed.
Thanks, Scott
On Mon, May 02, 2011 at 12:48:45PM -0500, Scott Duplichan wrote:
Kevin O'Connor wrote: ]Yeah. One can call malloc_low() during the init phase to allocate ]this buffer. Ideally, though, there would be some way to share buffer ]space with the other users (eg, cdemu, ide dma, usb). ] ]-Kevin
Hello Kevin,
The attached patch used a malloc_low disk sector buffer when the caller's buffer is not aligned. The read handling is tested with DOS and the write handling is untested.
There is another problem. The same DOS image as eltorito CD-ROM causes the same problem, but with atapi. So maybe a shared 2048 byte buffer is needed.
Thanks Scott,
I'd like to commit these fixes. Is there any chance you could break them up into separate patches - one per fix?
Gerd - can you also take a look through Scott's changes and confirm they look okay to you?
-Kevin
Thanks Scott,
I'd like to commit these fixes. Is there any chance you could break them up into separate patches - one per fix?
That would be great for review too.
thanks, Gerd
Gerd Hoffmann wrote,
]> Thanks Scott, ]> ]> I'd like to commit these fixes. Is there any chance you could break ]> them up into separate patches - one per fix? ] ]That would be great for review too. ] ]thanks, ] Gerd
Hello Gerd,
Oh, I see now I completely overlooked Kevin's original request to break up the AHCI patch. I will do that later today.
Thanks, Scott
Gerd Hoffmann wrote,
]> Thanks Scott, ]> ]> I'd like to commit these fixes. Is there any chance you could break ]> them up into separate patches - one per fix? ] ]That would be great for review too. ] ]thanks, ] Gerd
Hello Gerd,
Attached is a two part version of the previous patch. Stage1 is enough to get through most OS installs. Stage2 adds unaligned buffer support needed for MS-DOS.
Thanks, Scott
Hi,
Attached is a two part version of the previous patch. Stage1 is enough to get through most OS installs. Stage2 adds unaligned buffer support needed for MS-DOS.
Hmm, patch #1 is still a collection of multiple changes, looks a bit like trying this and that until it somehow worked, without figuring which changes where needed to make the box boot. Also the changelog should describe what was changed and why, not only the effect of the change.
Polling the IRQ status looks like a sensible thing to do. Note that there might be multiple bits set in the IRQ status register, so you can't use irqbits == 0x01 to check the status. qemu fails to boot because of that bug. Also it isn't obvious why you are looking for others than the "Device to Host Register FIS" interrupt. Care to explain?
I've attached a patch which switches ahci over to irq status polling, based on your patch. Works nicely in qemu. Can you give it a spin on real hardware?
cheers, Gerd
Gerd Hoffmann wrote:
] Hi, ] ]> Attached is a two part version of the previous patch. Stage1 is enough ]> to get through most OS installs. Stage2 adds unaligned buffer support ]> needed for MS-DOS. ] ]Hmm, patch #1 is still a collection of multiple changes, looks a bit ]like trying this and that until it somehow worked, without figuring ]which changes where needed to make the box boot.
Thank you for the feedback. This is a pretty harsh assessment of work that transforms non-functioning code into something usable on real hardware. Have you ever written code for real hardware or do work strictly with software models?
] Also the changelog ]should describe what was changed and why, not only the effect of the ]change.
These documents will help you understand the changes: http://download.intel.com/technology/serialata/pdf/rev1_3.pdf http://www.lttconn.com/res/lttconn/pdres/201005/20100521170123066.pdf
]Polling the IRQ status looks like a sensible thing to do. Note that ]there might be multiple bits set in the IRQ status register, so you ]can't use irqbits == 0x01 to check the status. qemu fails to boot ]because of that bug.
Have you considered the possibility that the reason this works on real hardware and not qemu is that the qemu AHCI model has an erratum?
] Also it isn't obvious why you are looking for ]others than the "Device to Host Register FIS" interrupt. Care to explain?
I don't understand this question.
]I've attached a patch which switches ahci over to irq status polling, ]based on your patch. Works nicely in qemu. Can you give it a spin on ]real hardware?
It fails on real (AMD) hardware.
I am surprised by qemu bias of your questions. The type of question I expected is "great to hear it works on AMD hardware, but what about Intel hardware?" My answer would be that I cannot easily test the code on Intel hardware.
Thanks, Scott
On 05/26/11 20:09, Scott Duplichan wrote:
Gerd Hoffmann wrote:
] Hi, ] ]> Attached is a two part version of the previous patch. Stage1 is enough ]> to get through most OS installs. Stage2 adds unaligned buffer support ]> needed for MS-DOS. ] ]Hmm, patch #1 is still a collection of multiple changes, looks a bit ]like trying this and that until it somehow worked, without figuring ]which changes where needed to make the box boot.
Thank you for the feedback. This is a pretty harsh assessment of work that transforms non-functioning code into something usable on real hardware.
Well, it breaks on qemu, and it does because of bugs in the code.
Have you ever written code for real hardware or do work strictly with software models?
I've actually done linux driver development (video4linux) before I've started working in the virtualization field.
]Polling the IRQ status looks like a sensible thing to do. Note that ]there might be multiple bits set in the IRQ status register, so you ]can't use irqbits == 0x01 to check the status. qemu fails to boot ]because of that bug.
Have you considered the possibility that the reason this works on real hardware and not qemu is that the qemu AHCI model has an erratum?
That is one possible reason, yes. Thats why I want understand which changes are needed to make ahci work on real hardware and especially *why* they are needed. Quite possible that qemu needs a fix, that certainly wouldn't be the first emulation bug ...
The way you check irqbits is a clear bug in your patch though. It is pure luck that you don't hit it on real hardware, probably due to the timing being different.
] Also it isn't obvious why you are looking for ]others than the "Device to Host Register FIS" interrupt. Care to explain?
I don't understand this question.
The device sends status register updates using the device to host register fis, right? Why do you check for updates of the other fis types? I don't see the point in doing so. Also you are reading error + status from the dma setup fis where those fields don't exist.
Why you are ignoring the status register when figuring whenever a operation was successful or not?
]I've attached a patch which switches ahci over to irq status polling, ]based on your patch. Works nicely in qemu. Can you give it a spin on ]real hardware?
It fails on real (AMD) hardware.
Ok, so switching the status polling alone doesn't cut it. Your patch also removes the loop waiting for the device becoming ready in ahci_port_probe(). Why?
It is possible that real hardware is simply alot slower than the qemu emulated hardware. Have you tried to increate the delay and/or the number of allowed retries to give the devices more time to become ready instead of just removing the loop?
I am surprised by qemu bias of your questions. The type of question I expected is "great to hear it works on AMD hardware, but what about Intel hardware?"
I want it work everythere. That includes qemu of course.
cheers, Gerd