While looking at VM bootup times, we stumbled over the fact that the NVMe
code only does I/O operations of up to 4kb at a given point in time. This
is usually ok, but if you have an OS that loads a lot of data on boot in
combination to network backed storage, it shows in bootup times.
There is no need to restrict ourselves to 4kb though. The INT13 call we
receive gives us much larger chunks which we can just map into a native
bigger NVMe I/O call if the request buffer is page aligned.
This …
[View More]patch implements all logic required to do the above and gives a
substantial performance boost on boot.
v1 -> v2:
- fix dprintf
- Fix bounds check on PRP list add logic
- Reduce PRP list to 15 entries embedded in the ns struct.
This reduces BIOS reserved memory footprint by 4kb.
v2 -> v3:
- new patch: nvme: Split requests by maximum allowed size
- Adjust the maximum request size to sector units. The hardware field
describes it in pages.
Alexander Graf (4):
nvme: Record maximum allowed request size
nvme: Allow to set PRP2
nvme: Pass large I/O requests as PRP lists
nvme: Split requests by maximum allowed size
src/hw/nvme-int.h | 16 ++++++-
src/hw/nvme.c | 122 +++++++++++++++++++++++++++++++++++++++++++++++-------
2 files changed, 121 insertions(+), 17 deletions(-)
--
2.16.4
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
[View Less]
This patch series make seabios linkable with lld.
This is beneficial for FreeBSD ports as well
https://svnweb.freebsd.org/ports/head/misc/seabios/Makefile
as they can drop an LLD_UNSAFE
As a maintainer of lld ELF, I have triaged numerous pieces of software.
seabios is the only one making use of
This drops the only instance https://sourceware.org/bugzilla/show_bug.cgi?id=12726
1, 2, 3 and 4 are really independent and can be applied in arbitrary order.
5 depends on 4. I originally combined 4 …
[View More]and 5 and that was why I don't think a patch series made sense.
Fangrui Song (5):
romlayout.S: Add missing SHF_ALLOC flag to .fixedaddr.\addr
Make rom16.o linkable with lld
Makefile: Change ET_EXEC to ET_REL so that lld can link bios.bin.elf
romlayout32flag.lds: Use `. +=` instead of `. =`
test-build.sh: Delete unneeded LD capability test
Makefile | 4 ++++
scripts/layoutrom.py | 13 ++++++++++---
scripts/test-build.sh | 42 +-----------------------------------------
src/romlayout.S | 2 +-
4 files changed, 16 insertions(+), 45 deletions(-)
--
2.26.0.292.g33ef6b2f38-goog
[View Less]
When binding AHCI controller to virtual machine, the attached ATA device cannot be recognized for the following reason:
The first Identify PACKET CMD will be failed without dobut which will lead PORT_IRQ_TF_ERR be set in PxIS due to ERR bit set to 1 in D2H Register FIS(PxTFD.STS.ERR=1). In this case, the first CMD will be timeout for seabios only polling PORT_IRQ_D2H_REG_FIS & PORT_IRQ_PIOS_FIS before triggering error handling which leads AHCI controller is an error state, then the second …
[View More]IDENTIFY CMD cannot be issued by AHCI controller, no ATA device will be detected.
In AHCI spec HBA Port State Machine section(5.3.8.1 and 5.3.16.5), we can see this sequence: RegFIS:Entry --> ERR:FatalTaskfile (PxTFD.STS.ERR=1) --> ERR:WaitForClear, no entry for PORT_IRQ_D2H_REG_FIS/PORT_IRQ_PIOS_FIS to be set at this time.
May be we can also add PORT_IRQ_TF_ERR bit check when polling command completion status?
fail log:
02.386: |bffa6000| AHCI/1: link up
02.387: |bffa6000| AHCI/1: send cmd ...
02.387: |bffa7000| phys_free bffa9c00 (detail=0xbffaa320)
02.396: |bffa7000| phys_free bffaa200 (detail=0xbffaa1d0)
02.396: |bffa7000| phys_free bffaa000 (detail=0xbffaa1a0)
02.396: |bffa7000| phys_free bffaa380 (detail=0xbffaa350)
02.400: \bffa7000/ End thread
02.401: phys_free bffa7000 (detail=0xbffaa170)
34.391: |bffa6000| WARNING - Timeout at ahci_command:153!
34.392: |bffa6000| AHCI/1: send cmd ...
66.397: |bffa6000| WARNING - Timeout at ahci_command:153!
Signed-off-by: zhaoxin\runaguooc <RunaGuo-oc(a)zhaoxin.com>
---
src/hw/ahci.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/src/hw/ahci.c b/src/hw/ahci.c index d45b430..bab5c35 100644
--- a/src/hw/ahci.c
+++ b/src/hw/ahci.c
@@ -104,7 +104,7 @@ static void ahci_port_writel(struct ahci_ctrl_s *ctrl, u32 pnr, u32 reg, u32 val static int ahci_command(struct ahci_port_s *port_gf, int iswrite, int isatapi,
void *buffer, u32 bsize) {
- u32 val, status, success, flags, intbits, error;
+ u32 val, status, success, flags, intbits, error, tf;
struct ahci_ctrl_s *ctrl = port_gf->ctrl;
struct ahci_cmd_s *cmd = port_gf->cmd;
struct ahci_fis_s *fis = port_gf->fis; @@ -148,6 +148,14 @@ static int ahci_command(struct ahci_port_s *port_gf, int iswrite, int isatapi,
error = GET_LOWFLAT(fis->rfis[3]);
break;
}
+
+ if(intbits & PORT_IRQ_TF_ERR){
+ tf = ahci_port_readl(ctrl,pnr,PORT_TFDATA);
+ status = tf & 0xff;
+ error = tf & 0xff00;
+ break;
+ }
+
}
if (timer_check(end)) {
warn_timeout();
--
2.17.1
[View Less]
Hi Nico, thank you very much for your kind help, it really helped to
advance! Please could you answer a small question:
While cleaning up some code I noticed there are Misc,Misc0,Misc1,Misc2
interrupts - with the following "magic" values respectively:
mainboard_picr_data:
[0x08] = 0x5A,0xF1,0x00,0x00,
mainboard_intr_data:
[0x08] = 0x00,0x00,0x00,0x00,
I tried replacing all these values by 0x1F ("unused") : the _intr_data
0x1F values got reverted to 0x00 during the boot time, while the
…
[View More]_picr_data 0x1F values stayed - but caused a SeaBIOS to hang while
booting (so I had to unbrick a laptop). Do you have any guess: from
where these 0x5A,0xF1,0x00,0x00 values for Misc interrupts come from,
and why are they essential for a SeaBIOS to work? Bolton FCH BIOS Dev
Guide doesn't tell anything about them, just mentions their offsets.
Best regards,
Mike
On Wed, Nov 11, 2020 at 2:43 AM Nico Huber <nico.h(a)gmx.de> wrote:
>
> Hi Mike,
>
> On 10.11.20 19:22, Mike Banon wrote:
> > Thank you very much for your advice, dear Naresh, I will try matching
> > the UEFI routing.
>
> I wouldn't expect too much. If things are configurable in the chipset
> (they usually are these days) it's possible that coreboot configures
> them differently and then the tables have to differ too.
>
> >> this old "getpir" utility may help you ;)
> >> You may have to run:
> >> coreboot/util$ git revert 6c90f3334e65ff4b0ff4900df77bc33d53beb677
> >
> > What I already discovered:
> > *) To activate the irq_tables.c / intel_irq_routing_table code, I need
> > to enable CONFIG_HAVE_PIRQ_TABLE and CONFIG_GENERATE_PIRQ_TABLE. But I
> > don't see it having any visible effect on the IRQ routing: instead,
> > maybe this intel_irq_routing_table is supposed to be a reflection of
> > this routing?
>
> I would only give these things a look if you are 200% sure that you need
> it. Basically no major OS has used these in two decades, hence most of
> it in coreboot is untested and broken. I wouldn't be surprised if there
> isn't a single case of correct PIRQ tables in the tree.
>
> You should check what KolibriOS actually uses. If it's not ACPI there
> are these three options (that I know about):
>
> * MP table
> * PIRQ tables
> * INT_LINE registers in each PCI device' configuration space
>
> The latter is both exceptionally simple and fragile. It ignores that
> the OS could make any changes at runtime. The expected IRQ number for
> each PCI device is simply written into that register by the firmware.
> It's ignored by the hardware but can be read by the OS. Here's a
> simple example:
>
> * Assuming there's a device PCI 03:00.0 triggering PCI INTA.
> * The chipset is configured to translate that to PIRQ_B.
> * PIRQ_B is configured to trigger IRQ 4.
> * Then coreboot would just write 4 into INT_LINE of 03:00.0.
>
> The OS could then pick that 4 up and it would work as long as nothing
> changes the PIRQ configuration. As all the PIRQ information is lost,
> the OS can't optimize things; but KolibriOS probably wouldn't want to?
>
> > *) By adding to mainboard.c / mainboard_pirq_data structure these lines
> > {NB_PCIE_PORT3_DEVFN, {PIRQ_A, PIRQ_B, PIRQ_C, PIRQ_D}}, /*
> > x4 PCIe: 02.3 */ /* 0:04.00 / 2:00.00 - IRQ 3 */
> > {NB_PCIE_PORT4_DEVFN, {PIRQ_A, PIRQ_B, PIRQ_C, PIRQ_D}}, /*
> > x4 PCIe: 02.4 */ /* 0:05.00 / 3:00.00 - IRQ 3 */
> > I got the interrupts assigned to Ethernet 2:00.00 and WiFi 3:00.00
> > devices, which are behind the 0:04.00 and 0:05.00 bridges
> > respectively. And this assignment is even visible by KolibriOS now!
> > But I don't know if it's normal that a lot of devices are sharing the
> > same IRQ 3 now, is it bad?
>
> If sharing is bad depends on the type of devices. As long as they
> are all PCI devs, it should work (maybe not at maximum efficiency).
> But if, for instance, it's shared with a legacy UART port, that
> couldn't work (different mode of IRQ, level vs. edge triggered).
>
> I'm not 100% sure, but it seems that with these lines you can
> control how an interrupt INTA (/B/C/D) message (PCIe always uses
> in-band messages) is interpreted by the chipset. It looks like
> you tell it INTA (the most used one) will trigger PIRQ_A, INTB
> PIRQ_B, etc. If it's really configurable (otherwise I don't see
> why there should be such a table), you could try to shuffle these
> around. e.g.
>
> { NB_PCIE_PORT4_DEVFN, { PIRQ_B, PIRQ_C, PIRQ_D, PIRQ_A } },
>
>
>
> Hope that helps,
> Nico
[View Less]
On Wed, 2020-11-04 at 23:27 +0000, Graf (AWS), Alexander wrote:
> > Am 31.10.2020 um 00:49 schrieb David Woodhouse <dwmw2(a)infradead.org>:
> >
> > From: David woodhouse <dwmw(a)amazon.co.uk>
> >
> > This ended up with an odd mix of recursion (albeit *mostly* tail-recursion)
> > and interation that could have been prettier. Make it prettier by making
> > nvme_build_prpl() return the number of blocks it consumes, and just
> > looping.
…
[View More]> >
> > If nvme_build_prpl() doesn't want to play, just fall through to the
> > original loop.
> >
> > Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
>
> Please don't merge just yet. I think we found a bug with this patch,
> but need to find out where exactly.
Found both of them :)
> > @@ -725,34 +726,28 @@ nvme_cmd_readwrite(struct nvme_namespace *ns, struct disk_op_s *op, int write)
> > {
> > int res = DISK_RET_SUCCESS;
> > u16 const max_blocks = NVME_PAGE_SIZE / ns->block_size;
> > - u16 i;
> > + u16 i, blocks;
> >
> > - /* Split up requests that are larger than the device can handle */
> > - if (op->count > ns->max_req_size) {
> > - u16 count = op->count;
> > + while (op->count && ((blocks = nvme_build_prpl(ns, op)) > 0)) {
1. If nvme_build_prpl() returns -1 that becomes (u16)0xFFFF and that's >0.
> > + res = nvme_io_readwrite(ns, op->lba, ns->prp1, blocks, write);
> > + dprintf(5, "ns %u %s lba %llu+%u: %d\n", ns->ns_id, write ? "write"
> > + : "read",
> > + op->lba, blocks, res);
> >
> > - /* Handle the first max_req_size elements */
> > - op->count = ns->max_req_size;
> > - if (nvme_cmd_readwrite(ns, op, write))
> > + if (res != DISK_RET_SUCCESS)
> > return res;
> >
> > - /* Handle the remainder of the request */
> > - op->count = count - ns->max_req_size;
> > - op->lba += ns->max_req_size;
> > - op->buf_fl += (ns->max_req_size * ns->block_size);
2. We can't do this. Callers expect op->count to contain the number of
blocks of I/O actually performed...
> > - return nvme_cmd_readwrite(ns, op, write);
> > - }
> > -
> > - if (!nvme_build_prpl(ns, op)) {
> > - /* Request goes via PRP List logic */
> > - return nvme_io_readwrite(ns, op->lba, ns->prp1, op->count, write);
> > + op->count -= blocks;
> > + op->lba += blocks;
> > + op->buf_fl += (blocks * ns->block_size);
... which means that when my cleanup made it happen unconditionally
instead of only in the case where the max size had been exceeded
and it actually had to recurse, it started showing up in the test
runs. But commit 94f0510dc has the problem already.
Running another full test set now, will post the fixed patch when it
completes.
[View Less]