Hi,
This patch series enables 64bit BAR support in seabios. It has a bit different approach for resources accounting, We did this because we wanted: a) Provide 64bit bar support for PCI BARs and bridges with 64bit memory window. b) Allow migration to 64bit bit ranges if we did not fit into 32bit range c) Keep implementation simple.
There are still have two main passes to enumerate resources and map devices, but structures are changed. We introduced two new structures: pci_region and pci_region_entry.
The pci_region structure includes a list of pci_region_entries. Each pci_region_entry could be a PCI bar or a downstream PCI region (bridge). Each entry has a set of attributes: type (IO, MEM, PREFMEM), is64bit (if address can be over 4GB), size, base address, PCI device owner, and a pointer to the pci_region it belongs to.
In the first pass we fill the pci_regions with entries and discover topology. In the second pass we try assigning memory addresses to pci_regions. If there is not enough space available the 64bit entries of root regions will be migrated to 64bit bit ranges and then we try assigning memory addresses again. Then each entry of each region will be mapped.
The patch series includes 6 patches. In the 1st patch we introduce new structures.
In the 2nd patch we introduce support functions for basic hlist operations, plus modify service functions to support 64bits address ranges. Note: I've seen similar hlist operations in post memory manager and stack location operations, it makes sense to move them to a header file.
In the 3rd patch a new function to fill pci_region structures with entries, and discover topology is added.
In the 4th patch we define address range for pci_region structure, migrate entries to 64bits address range if necessary, and program PCI BAR addresses and bridge regions.
In the 6th patch we clear old code.
And last patch is proposed by Michael Tsirkin, it contains changes in acpi-dsdt.dsl file those are necessary to support 64bit BARs in Windows.
src/acpi-dsdt.dsl | 7 + src/acpi-dsdt.hex | 72 ++++++-- src/config.h | 2 + src/pci.h | 6 - src/pciinit.c | 509 +++++++++++++++++++++++++++++----------------------- 5 files changed, 352 insertions(+), 244 deletions(-)
Note: At the moment there are three issues related to support of 64bit BARs in qemu (head of master branch). It's very likely they will be fixed in next qemu release.
The 1st one is described here (this issue causing problems if 64bit BAR is mapped below 4GB and Linux guest OS version is < 2.6.27): http://www.mail-archive.com/qemu-devel@nongnu.org/msg94522.html
The 2nd one is just a typo in i440fx init code (this issue is causing problems if somebody is going to access 64bit PCI memory - memory will be inaccessible): http://www.mail-archive.com/qemu-devel@nongnu.org/msg99423.html
The 3nd issue is related to a case of HUGE PCI bars when BAR size is 4GB and over. Qemu for some reasons reports zero size in this case. New seabios should handle huge bars well.
I've sent the patches for the first two issues. If they are all applied problems except the "huge BARs issue" should gone.
This patch introduces two structures instead of old pci_bus one. The pci_region structure describes one bus region. It includes a list of pci_region_entries and base address. Number of pci_region structures can be calculated: PCI_REGION_TYPE_COUNT * number of busses. Extra two regions can be added if we need 64bit address ranges.
The pci_region_entry describes PCI BAR resource or downstream PCI region (bridge region). Each entry should be assigned to a particular pci_region. If the entry describes a bridge region, it provides pci_region to downstream devices. Having this we can easily build topology and migrate entries if necessary.
Signed-off-by: Alexey Korolev alexey.korolev@endace.com
--- src/pci.h | 6 ------ src/pciinit.c | 37 ++++++++++++++++++++++--------------- 2 files changed, 22 insertions(+), 21 deletions(-)
diff --git a/src/pci.h b/src/pci.h index a2a5a4c..8fa064f 100644 --- a/src/pci.h +++ b/src/pci.h @@ -51,12 +51,6 @@ struct pci_device { u8 prog_if, revision; u8 header_type; u8 secondary_bus; - struct { - u32 addr; - u32 size; - int is64; - } bars[PCI_NUM_REGIONS]; - // Local information on device. int have_driver; }; diff --git a/src/pciinit.c b/src/pciinit.c index 9f3fdd4..2e5416c 100644 --- a/src/pciinit.c +++ b/src/pciinit.c @@ -31,18 +31,24 @@ static const char *region_type_name[] = { [ PCI_REGION_TYPE_PREFMEM ] = "prefmem", };
-struct pci_bus { - struct { - /* pci region stats */ - u32 count[32 - PCI_MEM_INDEX_SHIFT]; - u32 sum, max; - /* seconday bus region sizes */ - u32 size; - /* pci region assignments */ - u32 bases[32 - PCI_MEM_INDEX_SHIFT]; - u32 base; - } r[PCI_REGION_TYPE_COUNT]; - struct pci_device *bus_dev; +struct pci_region; +struct pci_region_entry { + struct pci_device *dev; + int bar; + u64 base; + u64 size; + int is64bit; + enum pci_region_type type; + struct pci_region *this_region; + struct pci_region *parent_region; + struct pci_region_entry *next; + struct pci_region_entry **pprev; +}; + +struct pci_region { + struct pci_region_entry *list; + struct pci_region_entry *this_entry; + u64 base; };
static int pci_size_to_index(u32 size, enum pci_region_type type) @@ -582,12 +588,13 @@ pci_setup(void) pci_probe_devices();
dprintf(1, "=== PCI new allocation pass #1 ===\n"); - struct pci_bus *busses = malloc_tmp(sizeof(*busses) * (MaxPCIBus + 1)); - if (!busses) { + int num_regions = (MaxPCIBus + 1) * PCI_REGION_TYPE_COUNT; + struct pci_region *regions = malloc_tmp(sizeof(*regions) * num_regions); + if (!regions) { warn_noalloc(); return; } - memset(busses, 0, sizeof(*busses) * (MaxPCIBus + 1)); + memset(regions, 0, sizeof(*regions) * num_regions); pci_bios_check_devices(busses); if (pci_bios_init_root_regions(&busses[0], start, end) != 0) { panic("PCI: out of address space\n");
This patch is all about service functions It includes: - basic operations with lists - 64bit modification of Pci_size_roundup() - modification of pci_bios_get_bar to support HUGE bars (size over 4GB) - 2 service function to get pci_region statistics - dump entry - for debug output
Signed-off-by: Alexey Korolev alexey.korolev@endace.com --- src/pciinit.c | 132 +++++++++++++++++++++++++++++++++++++++------------------ 1 files changed, 91 insertions(+), 41 deletions(-)
diff --git a/src/pciinit.c b/src/pciinit.c index 2e5416c..dbfa4f2 100644 --- a/src/pciinit.c +++ b/src/pciinit.c @@ -51,33 +51,30 @@ struct pci_region { u64 base; };
-static int pci_size_to_index(u32 size, enum pci_region_type type) -{ - int index = __fls(size); - int shift = (type == PCI_REGION_TYPE_IO) ? - PCI_IO_INDEX_SHIFT : PCI_MEM_INDEX_SHIFT; - - if (index < shift) - index = shift; - index -= shift; - return index; -} +#define foreach_region_entry(R, ENTRY) \ + for (ENTRY = (R)->list; ENTRY; ENTRY = ENTRY->next)
-static u32 pci_index_to_size(int index, enum pci_region_type type) -{ - int shift = (type == PCI_REGION_TYPE_IO) ? - PCI_IO_INDEX_SHIFT : PCI_MEM_INDEX_SHIFT; +#define foreach_region_entry_safe(R, N, ENTRY) \ + for (ENTRY = (R)->list; ENTRY && ({N=ENTRY->next; 1;}); \ + ENTRY = N)
- return 0x1 << (index + shift); +static inline void region_entry_del(struct pci_region_entry *entry) +{ + struct pci_region_entry *next = entry->next; + *entry->pprev = next; + if (next) + next->pprev = entry->pprev; }
-static enum pci_region_type pci_addr_to_type(u32 addr) +static inline void +region_entry_add(struct pci_region *r, struct pci_region_entry *entry) { - if (addr & PCI_BASE_ADDRESS_SPACE_IO) - return PCI_REGION_TYPE_IO; - if (addr & PCI_BASE_ADDRESS_MEM_PREFETCH) - return PCI_REGION_TYPE_PREFMEM; - return PCI_REGION_TYPE_MEM; + struct pci_region_entry *first = r->list; + entry->next = first; + if (first) + first->pprev = &entry->next; + r->list = entry; + entry->pprev = &r->list; }
static u32 pci_bar(struct pci_device *pci, int region_num) @@ -324,38 +321,91 @@ pci_bios_init_bus(void) pci_bios_init_bus_rec(0 /* host bus */, &pci_bus); }
- /**************************************************************** * Bus sizing ****************************************************************/
-static u32 pci_size_roundup(u32 size) +static u64 pci_size_roundup(u64 size) { - int index = __fls(size-1)+1; - return 0x1 << index; + int index = __fls((u32)((size - 1) >> 32)); + if (!index) + index = __fls((u32)(size - 1)); + return 0x1 << (index + 1); }
-static void -pci_bios_get_bar(struct pci_device *pci, int bar, u32 *val, u32 *size) +static u64 +pci_get_bar_size(struct pci_device *pci, int bar, + enum pci_region_type *type, int *is64bit) { u32 ofs = pci_bar(pci, bar); u16 bdf = pci->bdf; - u32 old = pci_config_readl(bdf, ofs); - u32 mask; - - if (bar == PCI_ROM_SLOT) { - mask = PCI_ROM_ADDRESS_MASK; - pci_config_writel(bdf, ofs, mask); + u32 l, sz, mask; + + mask = (bar == PCI_ROM_SLOT) ? PCI_ROM_ADDRESS_MASK : ~0; + l = pci_config_readl(bdf, ofs); + pci_config_writel(bdf, ofs, mask); + sz = pci_config_readl(bdf, ofs); + pci_config_writel(bdf, ofs, l); + + *is64bit = 0; + if (l & PCI_BASE_ADDRESS_SPACE_IO) { + mask = PCI_BASE_ADDRESS_IO_MASK; + *type = PCI_REGION_TYPE_IO; } else { - if (old & PCI_BASE_ADDRESS_SPACE_IO) - mask = PCI_BASE_ADDRESS_IO_MASK; + mask = PCI_BASE_ADDRESS_MEM_MASK; + if (l & PCI_BASE_ADDRESS_MEM_TYPE_64) + *is64bit = 1; + if (l & PCI_BASE_ADDRESS_MEM_PREFETCH) + *type = PCI_REGION_TYPE_PREFMEM; else - mask = PCI_BASE_ADDRESS_MEM_MASK; - pci_config_writel(bdf, ofs, ~0); + *type = PCI_REGION_TYPE_MEM; + } + if (*is64bit) { + u64 mask64, sz64 = sz; + l = pci_config_readl(bdf, ofs + 4); + pci_config_writel(bdf, ofs + 4, ~0); + sz = pci_config_readl(bdf, ofs + 4); + pci_config_writel(bdf, ofs + 4, l); + mask64 = mask | ((u64)0xffffffff << 32); + sz64 |= ((u64)sz << 32); + return (~(sz64 & mask64)) + 1; + } + return (u32)((~(sz & mask)) + 1); +} + +static u64 pci_region_max_size(struct pci_region *r) +{ + u64 max = 0; + struct pci_region_entry *entry; + foreach_region_entry(r, entry) { + max = (max > entry->size) ? max : entry->size; + } + return max; +} + +static u64 pci_region_sum(struct pci_region *r) +{ + u64 sum = 0; + struct pci_region_entry *entry; + foreach_region_entry(r, entry) { + sum += entry->size; } - *val = pci_config_readl(bdf, ofs); - pci_config_writel(bdf, ofs, old); - *size = (~(*val & mask)) + 1; + return sum; +} + +static void +dump_entry(struct pci_region_entry *entry) +{ + if (entry->this_region ) + dprintf(1, "PCI bus %d (secondary)", entry->dev->secondary_bus); + else + dprintf(1, "PCI region entry: bdf=%02x:%02x.%x BAR %d", + pci_bdf_to_bus(entry->dev->bdf), pci_bdf_to_dev(entry->dev->bdf), + pci_bdf_to_fn(entry->dev->bdf), entry->bar); + dprintf(1, " base=0x%08x%08x size=0x%08x%08x [%s, %s]\n", + (u32)(entry->base>>32), (u32)entry->base, + (u32)(entry->size>>32), (u32)entry->size, + region_type_name[entry->type],entry->is64bit ? "64bits" : "32bits"); }
static void pci_bios_bus_reserve(struct pci_bus *bus, int type, u32 size)
In this patch we fill pci_regions with entries.
The idea of implementation is pretty much the same as it was before in pci_check_devices() function. The pci_bios_fill_regions() function scans pci devices. 1) If pci device is a pci-to-pci bridge a) we create empty entry. b) Associate new entry with pci_region, which is provided by pci-to-pci bridge c) Add new entry to a list of pci_region of parent bus. 2) If pci device is not a bridge. a) Scan PCI BARs. b) Get size and attributes. (Type and is64bit) c) Add new entry to a list of pci_region of parent bus.
Then the pci_bios_fill_regions() scans pci_regions in reverse order to calculate size of pci_region_entries belonging to a bridge.
Signed-off-by: Alexey Korolev alexey.korolev@endace.com --- src/pciinit.c | 103 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 files changed, 98 insertions(+), 5 deletions(-)
diff --git a/src/pciinit.c b/src/pciinit.c index b02da89..03ece34 100644 --- a/src/pciinit.c +++ b/src/pciinit.c @@ -12,11 +12,10 @@ #include "pci_regs.h" // PCI_COMMAND #include "xen.h" // usingXen
-#define PCI_IO_INDEX_SHIFT 2 -#define PCI_MEM_INDEX_SHIFT 12 - -#define PCI_BRIDGE_IO_MIN 0x1000 -#define PCI_BRIDGE_MEM_MIN 0x100000 +#define PCI_DEV_IO_MINSIZE 4 +#define PCI_DEV_MEM_MINSIZE 0x1000 +#define PCI_BRIDGE_IO_MINSIZE 0x1000 +#define PCI_BRIDGE_MEM_MINSIZE 0x100000
enum pci_region_type { PCI_REGION_TYPE_IO, @@ -408,6 +407,96 @@ dump_entry(struct pci_region_entry *entry) region_type_name[entry->type],entry->is64bit ? "64bits" : "32bits"); }
+/**************************************************************** + * Build topology and calculate size of entries + ****************************************************************/ +struct pci_region_entry * +pci_region_create_entry(struct pci_region *parent, struct pci_device *dev, + u64 size, int type, int is64bit) +{ + struct pci_region_entry *entry= malloc_tmp(sizeof(*entry)); + if (!entry) { + warn_noalloc(); + return NULL; + } + memset(entry, 0, sizeof(*entry)); + + entry->dev = dev; + entry->type = type; + entry->is64bit = is64bit; + entry->size = size; + region_entry_add(parent, entry); + entry->parent_region = parent; + return entry; +} + +static int pci_bios_fill_regions(struct pci_region *regions) +{ + struct pci_region *this_region, *parent; + enum pci_region_type type; + struct pci_device *pci; + struct pci_region_entry *entry; + int is64bit, i; + u64 size, min_size; + + foreachpci(pci) { + if (pci->class == PCI_CLASS_BRIDGE_PCI) { + this_region = ®ions[pci->secondary_bus * PCI_REGION_TYPE_COUNT]; + parent = ®ions[pci_bdf_to_bus(pci->bdf) * PCI_REGION_TYPE_COUNT]; + for (type = 0; type < PCI_REGION_TYPE_COUNT; + type++, this_region++, parent++) { + /* Only prefetchable bridge regions can be 64bit */ + is64bit = (type == PCI_REGION_TYPE_PREFMEM); + entry = pci_region_create_entry(parent, pci, 0, type, is64bit); + if (!entry) + return -1; + entry->this_region = this_region; + this_region->this_entry = entry; + } + continue; + } + for (i = 0; i < PCI_NUM_REGIONS; i++) { + size = pci_get_bar_size(pci, i, &type, &is64bit); + if (size == 0) + continue; + min_size = (type == PCI_REGION_TYPE_IO) ? + PCI_DEV_IO_MINSIZE : PCI_DEV_MEM_MINSIZE; + size = (size > min_size) ? size : min_size; + + parent = ®ions[pci_bdf_to_bus(pci->bdf) * PCI_REGION_TYPE_COUNT + + type]; + entry = pci_region_create_entry(parent, pci, size, type, is64bit); + if (!entry) + return -1; + entry->bar = i; + dump_entry(entry); + if (is64bit) + i++; + } + } + + for (i = (MaxPCIBus + 1) * PCI_REGION_TYPE_COUNT ; i < 0; i--) { + struct pci_region_entry *this_entry = regions[i-1].this_entry; + if(!this_entry) + continue; + + is64bit = this_entry->is64bit; + size = 0; + foreach_region_entry(®ions[i-1], entry) { + size += entry->size; + is64bit &= entry->is64bit; + } + min_size = (this_entry->type == PCI_REGION_TYPE_IO) ? + PCI_BRIDGE_IO_MINSIZE : PCI_BRIDGE_MEM_MINSIZE; + size = (size > min_size) ? size : min_size; + this_entry->is64bit = is64bit; + this_entry->size = pci_size_roundup(size); + dump_entry(entry); + } + return 0; +} + + static void pci_bios_bus_reserve(struct pci_bus *bus, int type, u32 size) { u32 index; @@ -645,6 +734,10 @@ pci_setup(void) return; } memset(regions, 0, sizeof(*regions) * num_regions); + if (pci_bios_fill_regions(regions)) { + free(regions); + return; + } pci_bios_check_devices(busses); if (pci_bios_init_root_regions(&busses[0], start, end) != 0) { panic("PCI: out of address space\n");
In pci_bios_map_regions() we try to reserve memory for all entries of root bus regions. If pci_bios_init_root_regions() fails - e.g no enough space, we create two new pci_regions: r64pref, r64mem and migrate all entries which are 64bit capable to them. Migration process is very simple: delete the entry from one list add to another. Then try pci_bios_init_root_regions() again.
If it passes, we map entries for each region. 1. Calculate base address of the entry. And increase pci_region base address. 2. Program PCI BAR or bridge region. If the entry belongs to PCI-to-PCI bridge and provides a pci_region for downstream devices, we set base address of the region the entry provides. 3. Delete entry.
Signed-off-by: Alexey Korolev alexey.korolev@endace.com --- src/config.h | 2 + src/pciinit.c | 123 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 124 insertions(+), 1 deletions(-)
diff --git a/src/config.h b/src/config.h index b0187a4..bbacae7 100644 --- a/src/config.h +++ b/src/config.h @@ -47,6 +47,8 @@
#define BUILD_PCIMEM_START 0xe0000000 #define BUILD_PCIMEM_END 0xfec00000 /* IOAPIC is mapped at */ +#define BUILD_PCIMEM64_START 0x8000000000ULL +#define BUILD_PCIMEM64_END 0x10000000000ULL
#define BUILD_IOAPIC_ADDR 0xfec00000 #define BUILD_HPET_ADDRESS 0xfed00000 diff --git a/src/pciinit.c b/src/pciinit.c index 03ece34..0fba130 100644 --- a/src/pciinit.c +++ b/src/pciinit.c @@ -496,6 +496,126 @@ static int pci_bios_fill_regions(struct pci_region *regions) return 0; }
+/**************************************************************** + * Map pci region entries + ****************************************************************/ + +#define ROOT_BASE(top, sum, max) ALIGN_DOWN((top)-(sum),(max) ?: 1) +// Setup region bases (given the regions' size and alignment) +static int pci_bios_init_root_regions(struct pci_region *regions) +{ + struct pci_region *r_end, *r_start; + regions[PCI_REGION_TYPE_IO].base = 0xc000; + + r_end = ®ions[PCI_REGION_TYPE_PREFMEM]; + r_start = ®ions[PCI_REGION_TYPE_MEM]; + if (pci_region_sum(r_end) > pci_region_sum(r_start)) { + // Swap regions so larger area is more likely to align well. + r_end = r_start; + r_start = ®ions[PCI_REGION_TYPE_PREFMEM]; + } + // Out of space + if ((pci_region_sum(r_end) + pci_region_sum(r_start) > BUILD_PCIMEM_END)) + return -1; + + r_end->base = ROOT_BASE(BUILD_PCIMEM_END, pci_region_sum(r_end), + pci_region_max_size(r_end)); + r_start->base = ROOT_BASE(r_end->base, pci_region_sum(r_start), + pci_region_max_size(r_start)); + if (r_start->base < BUILD_PCIMEM_START) + // Memory range requested is larger than available... + return -1; + return 0; +} + +static void +pci_region_move_64bit_entries(struct pci_region *to, struct pci_region *from) +{ + struct pci_region_entry *entry, *next; + foreach_region_entry_safe(from, next, entry) { + if (entry->is64bit) { + region_entry_del(entry); + region_entry_add(to, entry); + entry->parent_region = to; + } + } +} + +static void pci_region_map_one_entry(struct pci_region_entry *entry) +{ + if (!entry->this_region ) { + pci_set_io_region_addr(entry->dev, entry->bar, entry->base); + if (entry->is64bit) + pci_set_io_region_addr(entry->dev, entry->bar + 1, entry->base >> 32); + return; + } + + entry->this_region->base = entry->base; + u16 bdf = entry->dev->bdf; + u64 base = entry->base; + u64 limit = entry->base + entry->size - 1; + if (entry->type == PCI_REGION_TYPE_IO) { + pci_config_writeb(bdf, PCI_IO_BASE, base >> 8); + pci_config_writew(bdf, PCI_IO_BASE_UPPER16, 0); + pci_config_writeb(bdf, PCI_IO_LIMIT, limit >> 8); + pci_config_writew(bdf, PCI_IO_LIMIT_UPPER16, 0); + } + if (entry->type == PCI_REGION_TYPE_MEM) { + pci_config_writew(bdf, PCI_MEMORY_BASE, base >> 16); + pci_config_writew(bdf, PCI_MEMORY_LIMIT, limit >> 16); + } + if (entry->type == PCI_REGION_TYPE_PREFMEM) { + pci_config_writew(bdf, PCI_PREF_MEMORY_BASE, base >> 16); + pci_config_writew(bdf, PCI_PREF_MEMORY_LIMIT, limit >> 16); + pci_config_writel(bdf, PCI_PREF_BASE_UPPER32, base >> 32); + pci_config_writel(bdf, PCI_PREF_LIMIT_UPPER32, limit >> 32); + } + return; +} + +static void pci_region_map_entries(struct pci_region *r) +{ + struct pci_region_entry *entry, *next; + u64 size, max_size = pci_region_max_size(r); + + for (size = max_size; size > 0; size >>= 1) { + foreach_region_entry_safe(r, next, entry) { + if (size == entry->size) { + entry->base = r->base; + r->base += size; + dump_entry(entry); + pci_region_map_one_entry(entry); + region_entry_del(entry); + free(entry); + } + } + } +} +static int pci_bios_map_regions(struct pci_region *regions) +{ + if (pci_bios_init_root_regions(regions)) { + struct pci_region r64pref, r64mem; + memset(&r64pref, 0, sizeof(struct pci_region)); + memset(&r64mem, 0, sizeof(struct pci_region)); + pci_region_move_64bit_entries(&r64pref, ®ions[PCI_REGION_TYPE_PREFMEM]); + pci_region_move_64bit_entries(&r64mem, ®ions[PCI_REGION_TYPE_MEM]); + + if (pci_bios_init_root_regions(regions)) { + panic("PCI: out of address space\n"); + } + r64pref.base = BUILD_PCIMEM64_START; + r64mem.base = ALIGN((r64pref.base + pci_region_sum(&r64pref)), + pci_region_max_size(&r64mem)); + pci_region_map_entries(&r64pref); + pci_region_map_entries(&r64mem); + } + + int i; + for (i = 0; i < (MaxPCIBus + 1) * PCI_REGION_TYPE_COUNT; i++) + pci_region_map_entries(®ions[i]); + + return 0; +}
static void pci_bios_bus_reserve(struct pci_bus *bus, int type, u32 size) { @@ -744,9 +864,10 @@ pci_setup(void) }
dprintf(1, "=== PCI new allocation pass #2 ===\n"); + pci_bios_map_regions(regions); pci_bios_map_devices(busses);
pci_bios_init_devices();
- free(busses); + free(regions); }
On 03/01/12 07:57, Alexey Korolev wrote:
In pci_bios_map_regions() we try to reserve memory for all entries of root bus regions. If pci_bios_init_root_regions() fails - e.g no enough space, we create two new pci_regions: r64pref, r64mem and migrate all entries which are 64bit capable to them. Migration process is very simple: delete the entry from one list add to another.
It isn't that simple. There are a bunch of constrains. First the bridge must be 64bit capable. All bridges up to the root bus in case of nested bridges. Second all other prefmem bars of devices behind the bridge must be 64bit capable too. Again, in case of nested bridges this applies to all devices behind the toplevel bridge.
cheers, Gerd
On 01/03/12 22:22, Gerd Hoffmann wrote:
On 03/01/12 07:57, Alexey Korolev wrote:
In pci_bios_map_regions() we try to reserve memory for all entries of root bus regions. If pci_bios_init_root_regions() fails - e.g no enough space, we create two new pci_regions: r64pref, r64mem and migrate all entries which are 64bit capable to them. Migration process is very simple: delete the entry from one list add to another.
It isn't that simple. There are a bunch of constrains. First the bridge must be 64bit capable. All bridges up to the root bus in case of nested bridges. Second all other prefmem bars of devices behind the bridge must be 64bit capable too. Again, in case of nested bridges this applies to all devices behind the toplevel bridge.
It must be simple as we derive 64bit flag from devices behind the bridge at the stage of building topology.
In other words if the entry at the stage of mapping has 64bit flag it means that the entry and all the entries behind it are 64bit capable.
+ for (i = (MaxPCIBus + 1) * PCI_REGION_TYPE_COUNT ; i < 0; i--) { + ................. + is64bit = this_entry->is64bit; + size = 0; + foreach_region_entry(®ions[i-1], entry) { + size += entry->size; + is64bit &= entry->is64bit; + } ................. + this_entry->is64bit = is64bit;
On 03/01/12 23:01, Alexey Korolev wrote:
On 01/03/12 22:22, Gerd Hoffmann wrote:
On 03/01/12 07:57, Alexey Korolev wrote:
In pci_bios_map_regions() we try to reserve memory for all entries of root bus regions. If pci_bios_init_root_regions() fails - e.g no enough space, we create two new pci_regions: r64pref, r64mem and migrate all entries which are 64bit capable to them. Migration process is very simple: delete the entry from one list add to another.
It isn't that simple. There are a bunch of constrains. First the bridge must be 64bit capable. All bridges up to the root bus in case of nested bridges. Second all other prefmem bars of devices behind the bridge must be 64bit capable too. Again, in case of nested bridges this applies to all devices behind the toplevel bridge.
It must be simple as we derive 64bit flag from devices behind the bridge at the stage of building topology.
In other words if the entry at the stage of mapping has 64bit flag it means that the entry and all the entries behind it are 64bit capable.
- for (i = (MaxPCIBus + 1) * PCI_REGION_TYPE_COUNT ; i < 0; i--) {
.................
is64bit = this_entry->is64bit;
size = 0;
foreach_region_entry(®ions[i-1], entry) {
size += entry->size;
is64bit &= entry->is64bit;
.................}
this_entry->is64bit = is64bit;
This pass looks at the children and clears the is64bit in the parent in case one of the children isn't 64bit capable.
I think you need a second pass here, clearing the is64bit in all children in case the parent has is64bit cleared.
cheers, Gerd
On 02/03/12 20:21, Gerd Hoffmann wrote:
On 03/01/12 23:01, Alexey Korolev wrote:
On 01/03/12 22:22, Gerd Hoffmann wrote:
On 03/01/12 07:57, Alexey Korolev wrote:
In pci_bios_map_regions() we try to reserve memory for all entries of root bus regions. If pci_bios_init_root_regions() fails - e.g no enough space, we create two new pci_regions: r64pref, r64mem and migrate all entries which are 64bit capable to them. Migration process is very simple: delete the entry from one list add to another.
It isn't that simple. There are a bunch of constrains. First the bridge must be 64bit capable. All bridges up to the root bus in case of nested bridges. Second all other prefmem bars of devices behind the bridge must be 64bit capable too. Again, in case of nested bridges this applies to all devices behind the toplevel bridge.
It must be simple as we derive 64bit flag from devices behind the bridge at the stage of building topology.
In other words if the entry at the stage of mapping has 64bit flag it means that the entry and all the entries behind it are 64bit capable.
- for (i = (MaxPCIBus + 1) * PCI_REGION_TYPE_COUNT ; i < 0; i--) {
.................
is64bit = this_entry->is64bit;
size = 0;
foreach_region_entry(®ions[i-1], entry) {
size += entry->size;
is64bit &= entry->is64bit;
.................}
this_entry->is64bit = is64bit;
This pass looks at the children and clears the is64bit in the parent in case one of the children isn't 64bit capable.
I think you need a second pass here, clearing the is64bit in all children in case the parent has is64bit cleared.
It is unnecessary since the parent is not 64bit capable - both the parent and its children would never be mapped to 64bit range. One thing is missing in my code is reading bridge capabilities - so I've added it.
cheers, Gerd
Delete old code.
Signed-off-by: Alexey Korolev alexey.korolev@endace.com --- src/pciinit.c | 212 --------------------------------------------------------- 1 files changed, 0 insertions(+), 212 deletions(-)
diff --git a/src/pciinit.c b/src/pciinit.c index 0fba130..9c41e3c 100644 --- a/src/pciinit.c +++ b/src/pciinit.c @@ -617,208 +617,6 @@ static int pci_bios_map_regions(struct pci_region *regions) return 0; }
-static void pci_bios_bus_reserve(struct pci_bus *bus, int type, u32 size) -{ - u32 index; - - index = pci_size_to_index(size, type); - size = pci_index_to_size(index, type); - bus->r[type].count[index]++; - bus->r[type].sum += size; - if (bus->r[type].max < size) - bus->r[type].max = size; -} - -static void pci_bios_check_devices(struct pci_bus *busses) -{ - dprintf(1, "PCI: check devices\n"); - - // Calculate resources needed for regular (non-bus) devices. - struct pci_device *pci; - foreachpci(pci) { - if (pci->class == PCI_CLASS_BRIDGE_PCI) { - busses[pci->secondary_bus].bus_dev = pci; - continue; - } - struct pci_bus *bus = &busses[pci_bdf_to_bus(pci->bdf)]; - int i; - for (i = 0; i < PCI_NUM_REGIONS; i++) { - u32 val, size; - pci_bios_get_bar(pci, i, &val, &size); - if (val == 0) - continue; - - pci_bios_bus_reserve(bus, pci_addr_to_type(val), size); - pci->bars[i].addr = val; - pci->bars[i].size = size; - pci->bars[i].is64 = (!(val & PCI_BASE_ADDRESS_SPACE_IO) && - (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK) - == PCI_BASE_ADDRESS_MEM_TYPE_64); - - if (pci->bars[i].is64) - i++; - } - } - - // Propagate required bus resources to parent busses. - int secondary_bus; - for (secondary_bus=MaxPCIBus; secondary_bus>0; secondary_bus--) { - struct pci_bus *s = &busses[secondary_bus]; - if (!s->bus_dev) - continue; - struct pci_bus *parent = &busses[pci_bdf_to_bus(s->bus_dev->bdf)]; - int type; - for (type = 0; type < PCI_REGION_TYPE_COUNT; type++) { - u32 limit = (type == PCI_REGION_TYPE_IO) ? - PCI_BRIDGE_IO_MIN : PCI_BRIDGE_MEM_MIN; - s->r[type].size = s->r[type].sum; - if (s->r[type].size < limit) - s->r[type].size = limit; - s->r[type].size = pci_size_roundup(s->r[type].size); - pci_bios_bus_reserve(parent, type, s->r[type].size); - } - dprintf(1, "PCI: secondary bus %d sizes: io %x, mem %x, prefmem %x\n", - secondary_bus, - s->r[PCI_REGION_TYPE_IO].size, - s->r[PCI_REGION_TYPE_MEM].size, - s->r[PCI_REGION_TYPE_PREFMEM].size); - } -} - -#define ROOT_BASE(top, sum, max) ALIGN_DOWN((top)-(sum),(max) ?: 1) - -// Setup region bases (given the regions' size and alignment) -static int pci_bios_init_root_regions(struct pci_bus *bus, u32 start, u32 end) -{ - bus->r[PCI_REGION_TYPE_IO].base = 0xc000; - - int reg1 = PCI_REGION_TYPE_PREFMEM, reg2 = PCI_REGION_TYPE_MEM; - if (bus->r[reg1].sum < bus->r[reg2].sum) { - // Swap regions so larger area is more likely to align well. - reg1 = PCI_REGION_TYPE_MEM; - reg2 = PCI_REGION_TYPE_PREFMEM; - } - bus->r[reg2].base = ROOT_BASE(end, bus->r[reg2].sum, bus->r[reg2].max); - bus->r[reg1].base = ROOT_BASE(bus->r[reg2].base, bus->r[reg1].sum - , bus->r[reg1].max); - if (bus->r[reg1].base < start) - // Memory range requested is larger than available. - return -1; - return 0; -} - - -/**************************************************************** - * BAR assignment - ****************************************************************/ - -static void pci_bios_init_bus_bases(struct pci_bus *bus) -{ - u32 base, newbase, size; - int type, i; - - for (type = 0; type < PCI_REGION_TYPE_COUNT; type++) { - dprintf(1, " type %s max %x sum %x base %x\n", region_type_name[type], - bus->r[type].max, bus->r[type].sum, bus->r[type].base); - base = bus->r[type].base; - for (i = ARRAY_SIZE(bus->r[type].count)-1; i >= 0; i--) { - size = pci_index_to_size(i, type); - if (!bus->r[type].count[i]) - continue; - newbase = base + size * bus->r[type].count[i]; - dprintf(1, " size %8x: %d bar(s), %8x -> %8x\n", - size, bus->r[type].count[i], base, newbase - 1); - bus->r[type].bases[i] = base; - base = newbase; - } - } -} - -static u32 pci_bios_bus_get_addr(struct pci_bus *bus, int type, u32 size) -{ - u32 index, addr; - - index = pci_size_to_index(size, type); - addr = bus->r[type].bases[index]; - bus->r[type].bases[index] += pci_index_to_size(index, type); - return addr; -} - -#define PCI_IO_SHIFT 8 -#define PCI_MEMORY_SHIFT 16 -#define PCI_PREF_MEMORY_SHIFT 16 - -static void pci_bios_map_devices(struct pci_bus *busses) -{ - // Setup bases for root bus. - dprintf(1, "PCI: init bases bus 0 (primary)\n"); - pci_bios_init_bus_bases(&busses[0]); - - // Map regions on each secondary bus. - int secondary_bus; - for (secondary_bus=1; secondary_bus<=MaxPCIBus; secondary_bus++) { - struct pci_bus *s = &busses[secondary_bus]; - if (!s->bus_dev) - continue; - u16 bdf = s->bus_dev->bdf; - struct pci_bus *parent = &busses[pci_bdf_to_bus(bdf)]; - int type; - for (type = 0; type < PCI_REGION_TYPE_COUNT; type++) { - s->r[type].base = pci_bios_bus_get_addr( - parent, type, s->r[type].size); - } - dprintf(1, "PCI: init bases bus %d (secondary)\n", secondary_bus); - pci_bios_init_bus_bases(s); - - u32 base = s->r[PCI_REGION_TYPE_IO].base; - u32 limit = base + s->r[PCI_REGION_TYPE_IO].size - 1; - pci_config_writeb(bdf, PCI_IO_BASE, base >> PCI_IO_SHIFT); - pci_config_writew(bdf, PCI_IO_BASE_UPPER16, 0); - pci_config_writeb(bdf, PCI_IO_LIMIT, limit >> PCI_IO_SHIFT); - pci_config_writew(bdf, PCI_IO_LIMIT_UPPER16, 0); - - base = s->r[PCI_REGION_TYPE_MEM].base; - limit = base + s->r[PCI_REGION_TYPE_MEM].size - 1; - pci_config_writew(bdf, PCI_MEMORY_BASE, base >> PCI_MEMORY_SHIFT); - pci_config_writew(bdf, PCI_MEMORY_LIMIT, limit >> PCI_MEMORY_SHIFT); - - base = s->r[PCI_REGION_TYPE_PREFMEM].base; - limit = base + s->r[PCI_REGION_TYPE_PREFMEM].size - 1; - pci_config_writew(bdf, PCI_PREF_MEMORY_BASE, base >> PCI_PREF_MEMORY_SHIFT); - pci_config_writew(bdf, PCI_PREF_MEMORY_LIMIT, limit >> PCI_PREF_MEMORY_SHIFT); - pci_config_writel(bdf, PCI_PREF_BASE_UPPER32, 0); - pci_config_writel(bdf, PCI_PREF_LIMIT_UPPER32, 0); - } - - // Map regions on each device. - struct pci_device *pci; - foreachpci(pci) { - if (pci->class == PCI_CLASS_BRIDGE_PCI) - continue; - u16 bdf = pci->bdf; - dprintf(1, "PCI: map device bdf=%02x:%02x.%x\n" - , pci_bdf_to_bus(bdf), pci_bdf_to_dev(bdf), pci_bdf_to_fn(bdf)); - struct pci_bus *bus = &busses[pci_bdf_to_bus(bdf)]; - int i; - for (i = 0; i < PCI_NUM_REGIONS; i++) { - if (pci->bars[i].addr == 0) - continue; - - int type = pci_addr_to_type(pci->bars[i].addr); - u32 addr = pci_bios_bus_get_addr(bus, type, pci->bars[i].size); - dprintf(1, " bar %d, addr %x, size %x [%s]\n", - i, addr, pci->bars[i].size, region_type_name[type]); - pci_set_io_region_addr(pci, i, addr); - - if (pci->bars[i].is64) { - i++; - pci_set_io_region_addr(pci, i, 0); - } - } - } -} - - /**************************************************************** * Main setup code ****************************************************************/ @@ -833,10 +631,6 @@ pci_setup(void) }
dprintf(3, "pci setup\n"); - - u32 start = BUILD_PCIMEM_START; - u32 end = BUILD_PCIMEM_END; - dprintf(1, "=== PCI bus & bridge init ===\n"); if (pci_probe_host() != 0) { return; @@ -858,14 +652,8 @@ pci_setup(void) free(regions); return; } - pci_bios_check_devices(busses); - if (pci_bios_init_root_regions(&busses[0], start, end) != 0) { - panic("PCI: out of address space\n"); - } - dprintf(1, "=== PCI new allocation pass #2 ===\n"); pci_bios_map_regions(regions); - pci_bios_map_devices(busses);
pci_bios_init_devices();
This patch was originally proposed by Michael, to solve issues I've seen on Windows guests, when 64bit BAR's are present. This patch also might be helpful on Linux guests when use_crs kernel boot option is set.
Signed-off-by: Alexey Korolev alexey.korolev@endace.com Signed-off-by: Michael S. Tsirkin mst@redhat.com --- src/acpi-dsdt.dsl | 7 +++++ src/acpi-dsdt.hex | 72 +++++++++++++++++++++++++++++++++++++++++++--------- 2 files changed, 66 insertions(+), 13 deletions(-)
diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl index 7082b65..c17e947 100644 --- a/src/acpi-dsdt.dsl +++ b/src/acpi-dsdt.dsl @@ -175,6 +175,13 @@ DefinitionBlock ( 0x00000000, // Address Translation Offset 0x1EC00000, // Address Length ,, , AddressRangeMemory, TypeStatic) + QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite, + 0x00000000, // Address Space Granularity + 0x8000000000, // Address Range Minimum + 0xFFFFFFFFFF, // Address Range Maximum + 0x00000000, // Address Translation Offset + 0x8000000000, // Address Length + ,, , AddressRangeMemory, TypeStatic) }) } } diff --git a/src/acpi-dsdt.hex b/src/acpi-dsdt.hex index 5dc7bb4..2393827 100644 --- a/src/acpi-dsdt.hex +++ b/src/acpi-dsdt.hex @@ -3,12 +3,12 @@ static unsigned char AmlCode[] = { 0x53, 0x44, 0x54, -0xd3, -0x10, +0x1, +0x11, 0x0, 0x0, 0x1, -0x2d, +0x1e, 0x42, 0x58, 0x50, @@ -31,9 +31,9 @@ static unsigned char AmlCode[] = { 0x4e, 0x54, 0x4c, -0x28, -0x5, -0x10, +0x23, +0x1, +0x9, 0x20, 0x10, 0x49, @@ -110,16 +110,16 @@ static unsigned char AmlCode[] = { 0x47, 0x42, 0x10, -0x44, -0x81, +0x42, +0x84, 0x5f, 0x53, 0x42, 0x5f, 0x5b, 0x82, -0x4c, -0x80, +0x4a, +0x83, 0x50, 0x43, 0x49, @@ -2064,10 +2064,10 @@ static unsigned char AmlCode[] = { 0x52, 0x53, 0x11, -0x42, -0x7, +0x40, +0xa, 0xa, -0x6e, +0x9c, 0x88, 0xd, 0x0, @@ -2176,6 +2176,52 @@ static unsigned char AmlCode[] = { 0x0, 0xc0, 0x1e, +0x8a, +0x2b, +0x0, +0x0, +0xc, +0x3, +0x0, +0x0, +0x0, +0x0, +0x0, +0x0, +0x0, +0x0, +0x0, +0x0, +0x0, +0x0, +0x80, +0x0, +0x0, +0x0, +0xff, +0xff, +0xff, +0xff, +0xff, +0x0, +0x0, +0x0, +0x0, +0x0, +0x0, +0x0, +0x0, +0x0, +0x0, +0x0, +0x0, +0x0, +0x0, +0x0, +0x80, +0x0, +0x0, +0x0, 0x79, 0x0, 0x10,
Hi,
a) Provide 64bit bar support for PCI BARs and bridges with 64bit memory window.
Bridge support seems to be completely untested. /me has a test setup using mst's bridge patches which looks like this:
[root@fedora ~]# lspci -tv -[0000:00]-+-00.0 Intel Corporation 440FX - 82441FX PMC [Natoma] +-01.0 Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] +-01.1 Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] +-01.2 Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] +-01.3 Intel Corporation 82371AB/EB/MB PIIX4 ACPI +-02.0 Red Hat, Inc. Device 0100 +-03.0 Red Hat, Inc Virtio network device +-05.0 Red Hat, Inc Virtio console +-06.0 Intel Corporation 82801AA AC'97 Audio Controller +-0a.0 Red Hat, Inc Device 1009 +-0b.0 Red Hat, Inc Device 1009 +-10.0-[01]----01.0 Red Hat, Inc Device 1110 +-10.1-[02]----01.0 Red Hat, Inc Virtio memory balloon +-10.2-[03]-- +-10.3-[04]-- +-11.0-[05-07]--+-01.0-[06]--+-01.0 Red Hat, Inc Device 1110 | | -02.0 Red Hat, Inc Device 1110 | -02.0-[07]--+-01.0 Red Hat, Inc Device 1110 | -02.0 Red Hat, Inc Device 1110 +-1d.0 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 +-1d.1 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 +-1d.2 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 +-1d.7 Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 -1f.0 Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller
The linux kernel completely redoes the bridge resource assignment. The seabios log looks sane for the root bus devices but not for devices behind bridges:
=== PCI new allocation pass #2 === PCI region entry: bdf=00:06.0 BAR 0 base=0x000000000000c000 size=0x0000000000000400 [io, 32bits] PCI region entry: bdf=00:06.0 BAR 1 base=0x000000000000c400 size=0x0000000000000100 [io, 32bits] PCI region entry: bdf=00:0b.0 BAR 0 base=0x000000000000c500 size=0x0000000000000040 [io, 32bits] PCI region entry: bdf=00:0a.0 BAR 0 base=0x000000000000c540 size=0x0000000000000040 [io, 32bits] PCI region entry: bdf=00:1f.0 BAR 4 base=0x000000000000c580 size=0x0000000000000020 [io, 32bits] PCI region entry: bdf=00:1d.2 BAR 4 base=0x000000000000c5a0 size=0x0000000000000020 [io, 32bits] PCI region entry: bdf=00:1d.1 BAR 4 base=0x000000000000c5c0 size=0x0000000000000020 [io, 32bits] PCI region entry: bdf=00:1d.0 BAR 4 base=0x000000000000c5e0 size=0x0000000000000020 [io, 32bits] PCI region entry: bdf=00:05.0 BAR 0 base=0x000000000000c600 size=0x0000000000000020 [io, 32bits] PCI region entry: bdf=00:03.0 BAR 0 base=0x000000000000c620 size=0x0000000000000020 [io, 32bits] PCI region entry: bdf=00:02.0 BAR 3 base=0x000000000000c640 size=0x0000000000000020 [io, 32bits] PCI region entry: bdf=00:01.2 BAR 4 base=0x000000000000c660 size=0x0000000000000020 [io, 32bits] PCI region entry: bdf=00:01.1 BAR 4 base=0x000000000000c680 size=0x0000000000000010 [io, 32bits] PCI region entry: bdf=00:02.0 BAR 0 base=0x00000000f8000000 size=0x0000000004000000 [mem, 32bits] PCI region entry: bdf=00:02.0 BAR 1 base=0x00000000fc000000 size=0x0000000001000000 [mem, 32bits] PCI region entry: bdf=00:03.0 BAR 6 base=0x00000000fd000000 size=0x0000000000010000 [mem, 32bits] PCI region entry: bdf=00:02.0 BAR 6 base=0x00000000fd010000 size=0x0000000000010000 [mem, 32bits] PCI region entry: bdf=00:02.0 BAR 2 base=0x00000000fd020000 size=0x0000000000002000 [mem, 32bits] PCI region entry: bdf=00:1f.0 BAR 5 base=0x00000000fd022000 size=0x0000000000001000 [mem, 32bits] PCI region entry: bdf=00:1d.7 BAR 0 base=0x00000000fd023000 size=0x0000000000001000 [mem, 32bits] PCI region entry: bdf=00:0b.0 BAR 1 base=0x00000000fd024000 size=0x0000000000001000 [mem, 32bits] PCI region entry: bdf=00:0a.0 BAR 1 base=0x00000000fd025000 size=0x0000000000001000 [mem, 32bits] PCI region entry: bdf=00:05.0 BAR 1 base=0x00000000fd026000 size=0x0000000000001000 [mem, 32bits] PCI region entry: bdf=00:03.0 BAR 1 base=0x00000000fd027000 size=0x0000000000001000 [mem, 32bits] PCI region entry: bdf=00:02.0 BAR 4 base=0x00000000e0000000 size=0x0000000010000000 [prefmem, 64bits] PCI region entry: bdf=01:01.0 BAR 0 base=0x0000000000000000 size=0x0000000000001000 [mem, 32bits] PCI region entry: bdf=01:01.0 BAR 2 base=0x0000000000000000 size=0x0000000040000000 [prefmem, 64bits] PCI region entry: bdf=02:01.0 BAR 0 base=0x0000000000000000 size=0x0000000000000020 [io, 32bits] PCI region entry: bdf=06:02.0 BAR 0 base=0x0000000000000000 size=0x0000000000001000 [mem, 32bits] PCI region entry: bdf=06:01.0 BAR 0 base=0x0000000000001000 size=0x0000000000001000 [mem, 32bits] PCI region entry: bdf=06:02.0 BAR 2 base=0x0000000000000000 size=0x0000000002000000 [prefmem, 64bits] PCI region entry: bdf=06:01.0 BAR 2 base=0x0000000002000000 size=0x0000000002000000 [prefmem, 64bits] PCI region entry: bdf=07:02.0 BAR 0 base=0x0000000000000000 size=0x0000000000001000 [mem, 32bits] PCI region entry: bdf=07:01.0 BAR 0 base=0x0000000000001000 size=0x0000000000001000 [mem, 32bits] PCI region entry: bdf=07:02.0 BAR 2 base=0x0000000000000000 size=0x0000000002000000 [prefmem, 64bits] PCI region entry: bdf=07:01.0 BAR 2 base=0x0000000002000000 size=0x0000000002000000 [prefmem, 64bits]
cheers, Gerd
Hi, What is your setup? I want to reproduce this case
a) Provide 64bit bar support for PCI BARs and bridges with 64bit memory window.
Bridge support seems to be completely untested. /me has a test setup using mst's bridge patches which looks like this:
[root@fedora ~]# lspci -tv -[0000:00]-+-00.0 Intel Corporation 440FX - 82441FX PMC [Natoma] +-01.0 Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] +-01.1 Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] +-01.2 Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] +-01.3 Intel Corporation 82371AB/EB/MB PIIX4 ACPI +-02.0 Red Hat, Inc. Device 0100 +-03.0 Red Hat, Inc Virtio network device +-05.0 Red Hat, Inc Virtio console +-06.0 Intel Corporation 82801AA AC'97 Audio Controller +-0a.0 Red Hat, Inc Device 1009 +-0b.0 Red Hat, Inc Device 1009 +-10.0-[01]----01.0 Red Hat, Inc Device 1110 +-10.1-[02]----01.0 Red Hat, Inc Virtio memory balloon +-10.2-[03]-- +-10.3-[04]-- +-11.0-[05-07]--+-01.0-[06]--+-01.0 Red Hat, Inc Device 1110 | | -02.0 Red Hat, Inc Device 1110 | -02.0-[07]--+-01.0 Red Hat, Inc Device 1110 | -02.0 Red Hat, Inc Device 1110 +-1d.0 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 +-1d.1 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 +-1d.2 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 +-1d.7 Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 -1f.0 Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller
The linux kernel completely redoes the bridge resource assignment. The seabios log looks sane for the root bus devices but not for devices behind bridges:
=== PCI new allocation pass #2 === PCI region entry: bdf=00:06.0 BAR 0 base=0x000000000000c000 size=0x0000000000000400 [io, 32bits] PCI region entry: bdf=00:06.0 BAR 1 base=0x000000000000c400 size=0x0000000000000100 [io, 32bits] PCI region entry: bdf=00:0b.0 BAR 0 base=0x000000000000c500 size=0x0000000000000040 [io, 32bits] PCI region entry: bdf=00:0a.0 BAR 0 base=0x000000000000c540 size=0x0000000000000040 [io, 32bits] PCI region entry: bdf=00:1f.0 BAR 4 base=0x000000000000c580 size=0x0000000000000020 [io, 32bits] PCI region entry: bdf=00:1d.2 BAR 4 base=0x000000000000c5a0 size=0x0000000000000020 [io, 32bits] PCI region entry: bdf=00:1d.1 BAR 4 base=0x000000000000c5c0 size=0x0000000000000020 [io, 32bits] PCI region entry: bdf=00:1d.0 BAR 4 base=0x000000000000c5e0 size=0x0000000000000020 [io, 32bits] PCI region entry: bdf=00:05.0 BAR 0 base=0x000000000000c600 size=0x0000000000000020 [io, 32bits] PCI region entry: bdf=00:03.0 BAR 0 base=0x000000000000c620 size=0x0000000000000020 [io, 32bits] PCI region entry: bdf=00:02.0 BAR 3 base=0x000000000000c640 size=0x0000000000000020 [io, 32bits] PCI region entry: bdf=00:01.2 BAR 4 base=0x000000000000c660 size=0x0000000000000020 [io, 32bits] PCI region entry: bdf=00:01.1 BAR 4 base=0x000000000000c680 size=0x0000000000000010 [io, 32bits] PCI region entry: bdf=00:02.0 BAR 0 base=0x00000000f8000000 size=0x0000000004000000 [mem, 32bits] PCI region entry: bdf=00:02.0 BAR 1 base=0x00000000fc000000 size=0x0000000001000000 [mem, 32bits] PCI region entry: bdf=00:03.0 BAR 6 base=0x00000000fd000000 size=0x0000000000010000 [mem, 32bits] PCI region entry: bdf=00:02.0 BAR 6 base=0x00000000fd010000 size=0x0000000000010000 [mem, 32bits] PCI region entry: bdf=00:02.0 BAR 2 base=0x00000000fd020000 size=0x0000000000002000 [mem, 32bits] PCI region entry: bdf=00:1f.0 BAR 5 base=0x00000000fd022000 size=0x0000000000001000 [mem, 32bits] PCI region entry: bdf=00:1d.7 BAR 0 base=0x00000000fd023000 size=0x0000000000001000 [mem, 32bits] PCI region entry: bdf=00:0b.0 BAR 1 base=0x00000000fd024000 size=0x0000000000001000 [mem, 32bits] PCI region entry: bdf=00:0a.0 BAR 1 base=0x00000000fd025000 size=0x0000000000001000 [mem, 32bits] PCI region entry: bdf=00:05.0 BAR 1 base=0x00000000fd026000 size=0x0000000000001000 [mem, 32bits] PCI region entry: bdf=00:03.0 BAR 1 base=0x00000000fd027000 size=0x0000000000001000 [mem, 32bits] PCI region entry: bdf=00:02.0 BAR 4 base=0x00000000e0000000 size=0x0000000010000000 [prefmem, 64bits] PCI region entry: bdf=01:01.0 BAR 0 base=0x0000000000000000 size=0x0000000000001000 [mem, 32bits] PCI region entry: bdf=01:01.0 BAR 2 base=0x0000000000000000 size=0x0000000040000000 [prefmem, 64bits] PCI region entry: bdf=02:01.0 BAR 0 base=0x0000000000000000 size=0x0000000000000020 [io, 32bits] PCI region entry: bdf=06:02.0 BAR 0 base=0x0000000000000000 size=0x0000000000001000 [mem, 32bits] PCI region entry: bdf=06:01.0 BAR 0 base=0x0000000000001000 size=0x0000000000001000 [mem, 32bits] PCI region entry: bdf=06:02.0 BAR 2 base=0x0000000000000000 size=0x0000000002000000 [prefmem, 64bits] PCI region entry: bdf=06:01.0 BAR 2 base=0x0000000002000000 size=0x0000000002000000 [prefmem, 64bits] PCI region entry: bdf=07:02.0 BAR 0 base=0x0000000000000000 size=0x0000000000001000 [mem, 32bits] PCI region entry: bdf=07:01.0 BAR 0 base=0x0000000000001000 size=0x0000000000001000 [mem, 32bits] PCI region entry: bdf=07:02.0 BAR 2 base=0x0000000000000000 size=0x0000000002000000 [prefmem, 64bits] PCI region entry: bdf=07:01.0 BAR 2 base=0x0000000002000000 size=0x0000000002000000 [prefmem, 64bits]
cheers, Gerd
On 03/01/12 22:48, Alexey Korolev wrote:
Hi, What is your setup? I want to reproduce this case
qemu: latest master with a few patches (mst's bridge patches, pci64 fixes from me, posted to qemu-devel a few days ago), bundle pushed to http://www.kraxel.org/cgit/qemu/log/?h=pci64 for your convenience.
See the attached config file (which you can pass to qemu via -readconfig) for the bridge setup.
cheers, Gerd
On 03/01/12 22:48, Alexey Korolev wrote:
Hi, What is your setup? I want to reproduce this case
qemu: latest master with a few patches (mst's bridge patches, pci64 fixes from me, posted to qemu-devel a few days ago), bundle pushed to http://www.kraxel.org/cgit/qemu/log/?h=pci64 for your convenience.
See the attached config file (which you can pass to qemu via -readconfig) for the bridge setup.
Thank you for this! Yes I have reproduced the issues. Fortunately there are just minor problems: 1. Typo in a for loop. 2. Stupid bug in pci_size_roundup function 3. Missed the code for reading the 64bit capability of the bridge.
I can either send a patch over existing patches, or send new series or both.
cheers, Gerd
Hi,
I can either send a patch over existing patches, or send new series or both.
For testing a incremental patch is fine, for merge a new series with the fixes squashed into the buggy patches is needed.
cheers, Gerd
On 05/03/12 23:12, Gerd Hoffmann wrote:
Hi,
I can either send a patch over existing patches, or send new series or both.
For testing a incremental patch is fine, for merge a new series with the fixes squashed into the buggy patches is needed.
cheers, Gerd
Sure. Here are the hot fixes for the "bridge" test, please apply the patch over this series:
diff --git a/src/pciinit.c b/src/pciinit.c index 9c41e3c..384209d 100644 --- a/src/pciinit.c +++ b/src/pciinit.c @@ -326,10 +326,14 @@ pci_bios_init_bus(void)
static u64 pci_size_roundup(u64 size) { - int index = __fls((u32)((size - 1) >> 32)); - if (!index) - index = __fls((u32)(size - 1)); - return 0x1 << (index + 1); + int rest = !!(size & (size - 1)); + int index; + if (size >> 32) { + index = __fls((u32)(size >> 32)); + return 0x1ULL << (index + rest + 32); + } + index = __fls((u32)(size)); + return 0x1ULL << (index + rest); }
static u64 @@ -372,6 +376,18 @@ pci_get_bar_size(struct pci_device *pci, int bar, return (u32)((~(sz & mask)) + 1); }
+static int pci_bridge_is64bit(struct pci_device *pci) +{ + u32 pmem = pci_config_readl(pci->bdf, PCI_PREF_MEMORY_BASE); + if (!pmem) { + pci_config_writel(pci->bdf, PCI_PREF_MEMORY_BASE, 0xfff0fff0); + pmem = pci_config_readl(pci->bdf, PCI_PREF_MEMORY_BASE); + pci_config_writel(pci->bdf, PCI_PREF_MEMORY_BASE, 0x0); + } + if ((pmem & PCI_PREF_RANGE_TYPE_MASK) == PCI_PREF_RANGE_TYPE_64) + return 1; + return 0; +} static u64 pci_region_max_size(struct pci_region *r) { u64 max = 0; @@ -446,7 +462,9 @@ static int pci_bios_fill_regions(struct pci_region *regions) for (type = 0; type < PCI_REGION_TYPE_COUNT; type++, this_region++, parent++) { /* Only prefetchable bridge regions can be 64bit */ - is64bit = (type == PCI_REGION_TYPE_PREFMEM); + is64bit = 0; + if (type == PCI_REGION_TYPE_PREFMEM) + is64bit = pci_bridge_is64bit(pci); entry = pci_region_create_entry(parent, pci, 0, type, is64bit); if (!entry) return -1; @@ -475,7 +493,7 @@ static int pci_bios_fill_regions(struct pci_region *regions) } }
- for (i = (MaxPCIBus + 1) * PCI_REGION_TYPE_COUNT ; i < 0; i--) { + for (i = (MaxPCIBus + 1) * PCI_REGION_TYPE_COUNT; i > 0; i--) { struct pci_region_entry *this_entry = regions[i-1].this_entry; if(!this_entry) continue; @@ -491,7 +509,7 @@ static int pci_bios_fill_regions(struct pci_region *regions) size = (size > min_size) ? size : min_size; this_entry->is64bit = is64bit; this_entry->size = pci_size_roundup(size); - dump_entry(entry); + dump_entry(this_entry); } return 0; }
On Thu, Mar 01, 2012 at 06:50:43PM +1300, Alexey Korolev wrote:
Hi,
This patch series enables 64bit BAR support in seabios. It has a bit different approach for resources accounting, We did this because we wanted: a) Provide 64bit bar support for PCI BARs and bridges with 64bit memory window. b) Allow migration to 64bit bit ranges if we did not fit into 32bit range c) Keep implementation simple.
Hrmm. By my count, this would be the third "rewrite" of the PCI bar initialization in the last 14 months.
[...]
The patch series includes 6 patches. In the 1st patch we introduce new structures.
Patch 1 does not look like it will compile independently. There is no point in breaking up patches if each part doesn't compile.
In the 2nd patch we introduce support functions for basic hlist operations, plus modify service functions to support 64bits address ranges. Note: I've seen similar hlist operations in post memory manager and stack location operations, it makes sense to move them to a header file.
In the 3rd patch a new function to fill pci_region structures with entries, and discover topology is added.
In the 4th patch we define address range for pci_region structure, migrate entries to 64bits address range if necessary, and program PCI BAR addresses and bridge regions.
In the 6th patch we clear old code.
Given the churn in this area, I don't want to commit patches that do wholesale code replacement. I'd prefer to see each patch independently add some functionality and perform its related cleanup.
Also, since Gerd has some patches pending in this area, we should figure out which direction makes sense. Can you explain on how this 64bit support is different from the support proposed by Gerd?
Thanks, -Kevin
On Thu, Mar 01, 2012 at 06:50:43PM +1300, Alexey Korolev wrote:
Hi,
This patch series enables 64bit BAR support in seabios. It has a bit different approach for resources accounting, We did this because we wanted: a) Provide 64bit bar support for PCI BARs and bridges with 64bit memory window. b) Allow migration to 64bit bit ranges if we did not fit into 32bit range c) Keep implementation simple.
Hrmm. By my count, this would be the third "rewrite" of the PCI bar initialization in the last 14 months.
Correct, it could be called a "rewrite"
[...]
The patch series includes 6 patches. In the 1st patch we introduce new structures.
Patch 1 does not look like it will compile independently. There is no point in breaking up patches if each part doesn't compile.
Sorry about this. In my case it was difficult to do as the changes are extensive and I wanted to explain how they work. Probably the best approach would be splitting into two series.
In the 2nd patch we introduce support functions for basic hlist operations, plus modify service functions to support 64bits address ranges. Note: I've seen similar hlist operations in post memory manager and stack location operations, it makes sense to move them to a header file.
In the 3rd patch a new function to fill pci_region structures with entries, and discover topology is added.
In the 4th patch we define address range for pci_region structure, migrate entries to 64bits address range if necessary, and program PCI BAR addresses and bridge regions.
In the 6th patch we clear old code.
Given the churn in this area, I don't want to commit patches that do wholesale code replacement. I'd prefer to see each patch independently add some functionality and perform its related cleanup.
I understand your point, will rework. Would it be reasonable if I send one patch series for redesign of existing implementation and another one for 64bit support?
Also, since Gerd has some patches pending in this area, we should figure out which direction makes sense. Can you explain on how this 64bit support is different from the support proposed by Gerd?
Ah it's a difficult thing, I don't want to criticise. You'll hate me :).
I think Gerd's implementation is about saving existing approach and having 64bit BARs support with incremental sort of changes. It is reasonable, but causes the code to be bulky, and adding extra types (PCI_REGION_TYPE_PREFMEM64) is a bit misleading.
1. Gerd's implementaton creates extra new region types:
static const char *region_type_name[] = {
[ PCI_REGION_TYPE_IO ] = "io", [ PCI_REGION_TYPE_MEM ] = "mem", [ PCI_REGION_TYPE_PREFMEM ] = "prefmem", [ PCI_REGION_TYPE_PREFMEM64 ] = "prefmem64", };
Note: if we are going to support 64bit PCI non prefetchable BARs on root bus we have to add an extra type PCI_REGION_TYPE_MEM64. In reality there are just 3 types (PCI_REGION_TYPE_IO, PCI_REGION_TYPE_MEM,PCI_REGION_TYPE_PREFMEM), so adding new types makes code bulkier and less strait-forward.
In my implementation there is no need to create extra new region types, we are still using standard types.
2. If we going to use existing approach and want to support 64bit bridges and bars with size over 4GB we need to extend arrays and rewrite all bus sizing functions:
struct { /* pci region stats */ u32 count[32 - PCI_MEM_INDEX_SHIFT]; - u32 sum, max; + u64 sum, max; /* seconday bus region sizes */ - u32 size; + u64 size; /* pci region assignments */ - u32 bases[32 - PCI_MEM_INDEX_SHIFT]; - u32 base; + u64 bases[32 - PCI_MEM_INDEX_SHIFT]; + u64 base; } r[PCI_REGION_TYPE_COUNT];
In other words instead of 32 for "bases" and "count" arrays it must be (40 or 39). Plus we have to rewrite index to size, size to index, size round up and so on functions to make them 64bit capable.
My implementation does this without adding much code, we just declare size of each entry.
3. About RAM use (may not be important), but lets count: 5 region types * (40 -PCI_MEM_INDEX_SHIFT ) * (sizeof(u64) + sizeof(u32)) = 1680 bytes for each bus. if we don' bother about PCI_REGION_TYPE_MEM64 it will be 1344 bytes for each bus. I didn't count other members so the size will be even more. Plus we allocate 20 bytes * PCI_NUM_REGIONS = 140bytes for each pci device.
In my approach we are using just 16bytes for each region, so 48bytes per bus, and 48bytes for each entry.
In conclusion:
Gerd's approach: src/pci.h | 14 ++- src/pciinit.c | 272 +++++++++++++++++++++++++++++++++++++++++---------------- 2 files changed, 206 insertions(+), 82 deletions(-)
+ Less marginal changes in code - Bulkier code - Doesn't support PCI bars and bridges with size over 4GB. - Doesn't support 64bit non-prefetchable BARs on root bus.
My approach: src/pci.h | 6 - src/pciinit.c | 509 ++++++++++++++++++++++++++++++++------------------------- 2 files changed, 284 insertions(+), 231 deletions(-)
- More marginal changes in code + Less bulkier code + Does support PCI bars and bridges with size over 4GB. + Does support 64bit non-prefetchable BARs on root bus.
I would certainly welcome Gerd's opinions on the two approaches as well.
On Mon, Mar 05, 2012 at 07:03:08PM +1300, Alexey Korolev wrote:
On Thu, Mar 01, 2012 at 06:50:43PM +1300, Alexey Korolev wrote:
In the 6th patch we clear old code.
Given the churn in this area, I don't want to commit patches that do wholesale code replacement. I'd prefer to see each patch independently add some functionality and perform its related cleanup.
I understand your point, will rework. Would it be reasonable if I send one patch series for redesign of existing implementation and another one for 64bit support?
Sending two patch series would be preferable. One series that converts the existing functionality to your new data structures, and one series that adds new capabilities.
Also, since Gerd has some patches pending in this area, we should figure out which direction makes sense. Can you explain on how this 64bit support is different from the support proposed by Gerd?
Ah it's a difficult thing, I don't want to criticise. You'll hate me :).
I think Gerd's implementation is about saving existing approach and having 64bit BARs support with incremental sort of changes. It is reasonable, but causes the code to be bulky, and adding extra types (PCI_REGION_TYPE_PREFMEM64) is a bit misleading.
I agree that PCI_REGION_TYPE_PREFMEM64 doesn't make sense to me.
- About RAM use (may not be important), but lets count:
FYI - ram use doesn't really matter. In any reasonable config there will be multiple megabytes of ram available. The ram allocated with malloc_tmp is not reserved beyond the BIOS init phase.
-Kevin
Hi,
Hrmm. By my count, this would be the third "rewrite" of the PCI bar initialization in the last 14 months.
Indeed.
Given the churn in this area, I don't want to commit patches that do wholesale code replacement. I'd prefer to see each patch independently add some functionality and perform its related cleanup.
Hardly doable, the algorithms are very different.
Also, since Gerd has some patches pending in this area, we should figure out which direction makes sense. Can you explain on how this 64bit support is different from the support proposed by Gerd?
My code keeps all state needed to do the pci bar allocation in the pci_bus struct. It counts how many bars of each type+size it has, then uses this for the allocation. It doesn't need per-device state. The logic is a bit twisted because of that. Main reason for this is that I wrote it before "struct pci_device" showed up in seabios (although the merge was afterwards).
Alexey's code takes a very different route: It uses pci_device data structure instead and organizes the pci bars in per-region lists. It makes sense to do that, I think that version is easier to understand when you look at it the first time.
Both approaches will work fine in the end. I don't care much, I just want something that works. It's probably a bit risky to merge Alexey's version before the planned mid-march release.
cheers, Gerd
On Mon, Mar 05, 2012 at 10:53:25AM +0100, Gerd Hoffmann wrote:
Given the churn in this area, I don't want to commit patches that do wholesale code replacement. I'd prefer to see each patch independently add some functionality and perform its related cleanup.
Hardly doable, the algorithms are very different.
I don't think that the algorithms are that different, and I don't think sending incremental patches is too difficult.
Looking at Alexey's patches, it seems that "struct pci_region_entry" == "struct pci_device.bars" and "struct pci_region" == "struct pci_bus.r". The pci_region_entry structs are dynamically allocated and put on lists, and the count/base arrays are replaced with list traversals. The core multi-pass algorithm which finds the devices, extracts the bar info, determines the required bus sizing, and then assigns the bars does not appear to be fundamentally different. I don't see why the data structures can't be converted in a series of incremental patches. The only significant algo change (replacement of count/base arrays with list traversal) should be a relatively simple reviewable patch once the data structures have been modified.
Both approaches will work fine in the end. I don't care much, I just want something that works. It's probably a bit risky to merge Alexey's version before the planned mid-march release.
FYI - the plan is to feature freeze in mid-march.
-Kevin
On 06/03/12 02:49, Kevin O'Connor wrote:
On Mon, Mar 05, 2012 at 10:53:25AM +0100, Gerd Hoffmann wrote:
Given the churn in this area, I don't want to commit patches that do wholesale code replacement. I'd prefer to see each patch independently add some functionality and perform its related cleanup.
Hardly doable, the algorithms are very different.
I don't think that the algorithms are that different, and I don't think sending incremental patches is too difficult.
Looking at Alexey's patches, it seems that "struct pci_region_entry" == "struct pci_device.bars" and "struct pci_region" == "struct pci_bus.r". The pci_region_entry structs are dynamically allocated and put on lists, and the count/base arrays are replaced with list traversals. The core multi-pass algorithm which finds the devices, extracts the bar info, determines the required bus sizing, and then assigns the bars does not appear to be fundamentally different. I don't see why the data structures can't be converted in a series of incremental patches. The only significant algo change (replacement of count/base arrays with list traversal) should be a relatively simple reviewable patch once the data structures have been modified.
Right, there is no point to make big functional changes, since existing algorithms are same. Note: pci_region_entry could be either pci_device.bar or pci_bridge.region.
Both approaches will work fine in the end. I don't care much, I just want something that works. It's probably a bit risky to merge Alexey's version before the planned mid-march release.
FYI - the plan is to feature freeze in mid-march.
A question about lists, shall I move lists operation to header file and make use the list service functions for PMM and Stack code or it will be applied later as a code "optimization"?
Kevin O'Connor wrote:
By my count, this would be the third "rewrite" of the PCI bar initialization in the last 14 months.
This happens to be solved really well in coreboot already. I can not understand the crazy duplicated effort.
//Peter