Check for a terminating LAR member which tells us that no further LAR member except the bootblock will be found after this member. The LAR member has a normal MAGIC, but all other parts of struct lar_header are 0xff. That way, adding a new member in place of the terminating member will not need an erase cycle.
Signed-off-by: Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net
Index: LinuxBIOSv3-larplaceholder/lib/lar.c =================================================================== --- LinuxBIOSv3-larplaceholder/lib/lar.c (Revision 546) +++ LinuxBIOSv3-larplaceholder/lib/lar.c (Arbeitskopie) @@ -58,7 +58,7 @@
int find_file(const struct mem_file *archive, const char *filename, struct mem_file *result) { - char *walk, *fullname; + char *walk, *fullname, *tmp; struct lar_header *header;
printk(BIOS_INFO, "LAR: Attempting to open '%s'.\n", filename); @@ -105,10 +105,24 @@ for (walk = archive->start; (walk < (char *)(archive->start + archive->len - sizeof(struct lar_header))) && (walk >= (char *)archive->start); walk += 16) { - if (strncmp(walk, MAGIC, 8) != 0) + if (strncmp(walk, MAGIC, sizeof(header->magic)) != 0) continue;
header = (struct lar_header *)walk; + /* The loop below deserves an explanation: The final iteration + * of the loop will access the byte after the header. If that + * iteration has been reached, all checked bytes in the header + * were 0xff. This saves one state variable. + */ + for (tmp = walk + sizeof(header->magic); + tmp <= walk + sizeof(*header); tmp++) { + if (*tmp != (char)0xff) + break; + } + if (tmp == walk + sizeof(*header)) { + printk(BIOS_SPEW, "LAR: termination pseudomember reached.\n"); + break; + } fullname = walk + sizeof(struct lar_header);
printk(BIOS_SPEW, "LAR: seen member %s\n", fullname);
Carl-Daniel Hailfinger wrote:
Check for a terminating LAR member which tells us that no further LAR member except the bootblock will be found after this member. The LAR member has a normal MAGIC, but all other parts of struct lar_header are 0xff. That way, adding a new member in place of the terminating member will not need an erase cycle.
I don't see a gain in this. Since we know the position and size of the lar archive anyways, we know nothing will come after the bootblock.
There should not be any headers that do not belong to real "files" in the lar, that would be breaking our model.
On Jan 5, 2008 2:13 PM, Stefan Reinauer stepan@coresystems.de wrote:
I don't see a gain in this. Since we know the position and size of the lar archive anyways, we know nothing will come after the bootblock.
There should not be any headers that do not belong to real "files" in the lar, that would be breaking our model.
the issue is that, before cache is enabled, we lose many, many seconds as the lar code scans all of flash. So we need an 'end of headers' entry. We need to know when to stop scanning.
more later, gotta go.
ron
ron minnich wrote:
On Jan 5, 2008 2:13 PM, Stefan Reinauer stepan@coresystems.de wrote:
I don't see a gain in this. Since we know the position and size of the lar archive anyways, we know nothing will come after the bootblock.
There should not be any headers that do not belong to real "files" in the lar, that would be breaking our model.
the issue is that, before cache is enabled, we lose many, many seconds as the lar code scans all of flash. So we need an 'end of headers' entry. We need to know when to stop scanning.
Many seconds, though just scanning for an 8 byte signature at a 16byte aligned address?
That's a max of 32k string compares for every file that is _not_ found. Otherwise the number of compares is as high as the number of files in the lar minus 1. (ie. below 10)
We should fix linuxbios, so we do not look for files that are not there. rather than invent broken pseudo files.
This is stuff that people will look at in 1-2 years and not understand why it went in, and someone is going to say "I don't remember but there was a nasty reason why we did it. We better keep it."
On Jan 5, 2008 2:27 PM, Stefan Reinauer stepan@coresystems.de wrote:
Many seconds, though just scanning for an 8 byte signature at a 16byte aligned address?
it's depressing, but yes. It's also surprising.
That's a max of 32k string compares for every file that is _not_ found. Otherwise the number of compares is as high as the number of files in the lar minus 1. (ie. below 10)
no, it walks all of flash. see lib/lar.c -- or it seems to, it takes so long.
We should fix linuxbios, so we do not look for files that are not there. rather than invent broken pseudo files.
works for me ... how?
ron
ron minnich wrote:
On Jan 5, 2008 2:27 PM, Stefan Reinauer stepan@coresystems.de wrote:
Many seconds, though just scanning for an 8 byte signature at a 16byte aligned address?
it's depressing, but yes. It's also surprising.
That's a max of 32k string compares for every file that is _not_ found. Otherwise the number of compares is as high as the number of files in the lar minus 1. (ie. below 10)
no, it walks all of flash. see lib/lar.c -- or it seems to, it takes so long.
I can't find the behavior you describe in find_file(). It returns as soon as the file is found. Your delay must come from some other issue.
I suspect we are looking for files that are not there yet. Probably some normal/fallback stuff that our makefiles don't do yet, but stage1.c does. We should not look for fallback files if we don't compile any.
Or, we just add an "empty" lar with a correct header containing all of 0xff, having a correct size etc. Then we don't use pseudo headers at least. But this would break our design idea to allow files with holes in them so this should really be the last resort.
Stefan
On Jan 5, 2008 2:40 PM, Stefan Reinauer stepan@coresystems.de wrote:
I can't find the behavior you describe in find_file(). It returns as soon as the file is found. Your delay must come from some other issue.
it comes when a file is NOT found, which can happen.
I suspect we are looking for files that are not there yet. Probably some normal/fallback stuff that our makefiles don't do yet, but stage1.c does. We should not look for fallback files if we don't compile any.
you can't count on that .Suppose somebody builds a bios with fallback files, and then removes them later. And then adds them back. that's the beauty of lar -- we can change things.
NOT finding a file should be efficient.
Also, the code looks for payload/segment0, 1, 2, and stops when no more are found. That 'not found' really takes time. And, we can't compile this in -- payloads can change too, and the # segments can change.
So we need an efficient way to terminate the search. Carl-Daniel's idea is not too bad.
ron
On Sat, Jan 05, 2008 at 02:44:55PM -0800, ron minnich wrote:
NOT finding a file should be efficient.
Yes.
Where do the delay come from? Can anyone measure the LPC?
//Peter
On Jan 5, 2008 5:11 PM, Peter Stuge peter@stuge.se wrote:
On Sat, Jan 05, 2008 at 02:44:55PM -0800, ron minnich wrote:
NOT finding a file should be efficient.
Yes.
Where do the delay come from? Can anyone measure the LPC?
I will try to measure for you but remember it is running uncached. Each and every access is a full LPC cycle, which is not really fast.
ron
On 06.01.2008 05:52, ron minnich wrote:
On Jan 5, 2008 5:11 PM, Peter Stuge peter@stuge.se wrote:
On Sat, Jan 05, 2008 at 02:44:55PM -0800, ron minnich wrote:
NOT finding a file should be efficient.
Yes.
Where do the delay come from? Can anyone measure the LPC?
I will try to measure for you but remember it is running uncached. Each and every access is a full LPC cycle, which is not really fast.
IIRC the PCEngines alix.1c has parallel flash, so the following does not apply to our current problem.
However, for all boards with LPC-to-SPI translation through a IT8716F Super I/O chip, the following applies: My calculations suggest that for optimal reads (4 bytes at a 4 byte boundary) with the IT8716F SPI translation function, we need (1 byte opcode + 3 bytes addr + 4 bytes data)* 8 = 64 clock cycles on the SPI bus alone. With a maximum of 33 MHz for the SPI clock on that particular chip (MUST verify that the chip is set to 33 MHz and not 16 MHz!), this optimal read takes 2 usec. Smaller reads take at least 1.25 usec. On top of the pure SPI bus time we have to add the time to transmit and receive that data over LPC. In case we don't use the automatic LPC-to-SPI translation and use the embedded SPI controller directly, the LPC overhead is replaced by 6+ ISA port byte reads and 5 ISA port byte writes, which is probably even slower.
Regards, Carl-Daniel
On Jan 5, 2008 2:13 PM, Stefan Reinauer stepan@coresystems.de wrote:
Carl-Daniel Hailfinger wrote:
Check for a terminating LAR member which tells us that no further LAR member except the bootblock will be found after this member. The LAR member has a normal MAGIC, but all other parts of struct lar_header are 0xff. That way, adding a new member in place of the terminating member will not need an erase cycle.
I don't see a gain in this. Since we know the position and size of the lar archive anyways, we know nothing will come after the bootblock.
OK, I'm back. Stefan, we don't really need this for correctness. But on the alix1c I am observing huge delays at startup while find_file iterates through 512KB of empty FLASH looking for entries that are not there. So the goal here is to have a header which says "I'm the last one" so we can short-circuit scanning all of flash. This one change will shave quite a few seconds from boot time.
There should not be any headers that do not belong to real "files" in the lar, that would be breaking our model.
The model is fine, it's just slow. This is a performance patch.None of us anticipated the slowness of walking FLASH looking for LAR headers.
thanks
ron
* ron minnich rminnich@gmail.com [080105 23:23]:
OK, I'm back. Stefan, we don't really need this for correctness. But on the alix1c I am observing huge delays at startup while find_file iterates through 512KB of empty FLASH looking for entries that are not there. So the goal here is to have a header which says "I'm the last one" so we can short-circuit scanning all of flash. This one change will shave quite a few seconds from boot time.
And we can't cache the ROM because of CAR?
I am not convinced that failing to find files has to be a performance critical path - While I generally agree it must not take several seconds, something's wrong with our logic if we fail to find files before we have RAM and Cache enabled.
On Jan 6, 2008 3:29 AM, Stefan Reinauer stepan@coresystems.de wrote:
And we can't cache the ROM because of CAR?
yes.
I am not convinced that failing to find files has to be a performance critical path - While I generally agree it must not take several seconds, something's wrong with our logic if we fail to find files before we have RAM and Cache enabled.
I think we should eliminate all unnnecessary overhead. Scanning (e.g.) 2 Mbytes of memory for each failed lookup -- all the other BIOSes are going to point at us and make rude comments :-)
I really would like us to have a reasonable termination instead of searching all of memory. It can be anything -- make a file name called EOF. Carl-Daniel's design was nice because it was very flash-friendly, but I'm not picky.
thanks
ron
On 06.01.2008 20:29, ron minnich wrote:
On Jan 6, 2008 3:29 AM, Stefan Reinauer stepan@coresystems.de wrote:
And we can't cache the ROM because of CAR?
yes.
Well, we can't use CPU cache, but we can abuse CAR memory as cache (ironic, isn't it? I'll call it CARAC.) for the first few (or all) members in the LAR. The cost estimate would be 8-32 bytes per LAR member, depending on your personal opinion on memory usage vs. speed tradeoff.
I am not convinced that failing to find files has to be a performance critical path - While I generally agree it must not take several seconds, something's wrong with our logic if we fail to find files before we have RAM and Cache enabled.
I think we should eliminate all unnnecessary overhead. Scanning (e.g.) 2 Mbytes of memory for each failed lookup -- all the other BIOSes are going to point at us and make rude comments :-)
I have another orthogonal design pending which incurs the lookup cost only on the first lookup. See above for the basic idea. The first failed lookup will incur the (almost) full cost of walking the LAR, all subsequent lookups (whether successful or failed) will cost O(larmembers) RAM/CARAC accesses.
I really would like us to have a reasonable termination instead of searching all of memory. It can be anything -- make a file name called EOF. Carl-Daniel's design was nice because it was very flash-friendly, but I'm not picky.
I guess I need to explain the rationale in more detail. Our routines work just fine and with almost maximum efficiency achievable for a linear search as long as there are no empty (erased) areas before the LAR member we try to find. If there are empty areas in flash, we walk those empty areas in 16 byte steps. That is horribly inefficient and makes failed lookups slow, especially if your flash is almost empty.
The bootblock will always be the last member of the lar. We also try to fill the lar from the bottom up. That means unless the lar is completely full, we will walk an empty are in 16 byte increments. Since we know the location of the boot block (top of flash) and we also know we never have to lookup or load the bootblock during LB execution, there should be an easy way to specify "from here to the beginning of the bootblock there will be no file" which is equivalent to "stop searching here". The "no file till bootblock" functionality can be implemented in three ways: 1. Explicit "stop searching here" marker or lar member. 2. Explicit lar member covering the free space. 3. List free space somehere in flash.
Two of these ways fail in practice: 3. Listing free space in flash means erasing and/or rewriting that list everytime the lar changes. The only reasonable place for such a list would be the bootblock and we don't want to touch it during partial reflash. 2. Explicit free space covering lar member is fine, but to add any new file to the archive, you have to erase the part covered by the exlusive space.
That leaves way #1: 1. Explicit "stop searching here" marker. Either we arrange for the marker bit pattern to be overwritable without erasing or not. Using the "normal" LAR header infrastructure, that would be an entry starting with "LARCHIVE" followed by a series of 0xFF bytes . That way, we can create a new LAR member without having to erase anything because the "LARCHIVE" magic stays the same and all other bits are still 1 and can be written to.
Regards, Carl-Daniel
Carl-Daniel,
thanks for bringing this up.
* Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net [080107 10:40]:
The bootblock will always be the last member of the lar. We also try to fill the lar from the bottom up. That means unless the lar is completely full, we will walk an empty are in 16 byte increments. Since we know the location of the boot block (top of flash) and we also know we never have to lookup or load the bootblock during LB execution, there should be an easy way to specify "from here to the beginning of the bootblock there will be no file" which is equivalent to "stop searching here". The "no file till bootblock" functionality can be implemented in three ways:
- Explicit "stop searching here" marker or lar member.
- Explicit lar member covering the free space.
- List free space somehere in flash.
Two of these ways fail in practice: 3. Listing free space in flash means erasing and/or rewriting that list everytime the lar changes. The only reasonable place for such a list would be the bootblock and we don't want to touch it during partial reflash.
- Explicit free space covering lar member is fine, but to add any new
file to the archive, you have to erase the part covered by the exlusive space.
I don't think this is such a bit problem. Lar could remove this entry automatically, and recreate it. This is not more special code in lar than 1. But I start thinking that those two are pretty much the same in the end.
What about a linked list in the header? Each file could explicitly point to the next file. If the pointer is 0 or -1, we stop.
This way we would not lock out the "bootblock" file in our lar search.
On 07.01.2008 14:01, Stefan Reinauer wrote:
Carl-Daniel,
thanks for bringing this up.
You're welcome.
- Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net [080107 10:40]:
The bootblock will always be the last member of the lar. We also try to fill the lar from the bottom up. That means unless the lar is completely full, we will walk an empty are in 16 byte increments. Since we know the location of the boot block (top of flash) and we also know we never have to lookup or load the bootblock during LB execution, there should be an easy way to specify "from here to the beginning of the bootblock there will be no file" which is equivalent to "stop searching here". The "no file till bootblock" functionality can be implemented in three ways:
- Explicit "stop searching here" marker or lar member.
- Explicit lar member covering the free space.
- List free space somehere in flash.
Two of these ways fail in practice: 3. Listing free space in flash means erasing and/or rewriting that list everytime the lar changes. The only reasonable place for such a list would be the bootblock and we don't want to touch it during partial reflash.
- Explicit free space covering lar member is fine, but to add any new
file to the archive, you have to erase the part covered by the exlusive space.
I don't think this is such a bit problem. Lar could remove this entry automatically, and recreate it. This is not more special code in lar than 1. But I start thinking that those two are pretty much the same in the end.
Yes, 1 and 2 are very similar. Your proposal is almost exactly what I had in mind. This "dummy" member would be removed once a file was added to the lar and the new entry would "overwrite" the dummy header. If we have an explicit size in the dummy header, we can point to the next header easily without having to stop. However, once we use an explicit size (setting some bits to 0), we have to erase the flash block to add a new entry in place of the dummy. If we don't specify the size and keep all header entries (except magic) at value 0xff (all bits set), we still can write the new archive member without erasing because setting bits to 0 is always possible.
My proposed code change is not very flexible and really only serves as "end of interesting lar members" marker, but it is flash-friendly. Besides that, the code can be overridden so if you really want to look at the boot block, you can do so easily.
What about a linked list in the header? Each file could explicitly point to the next file. If the pointer is 0 or -1, we stop.
Nice idea, but it may not work in all cases. Requirements for this to work cleanly are: - The next pointer needs to have all bits set to 1 in the "stop pointer" case, or it is impossible to add a member after the member with a stop pointer without erasing the sector containing the header with the stop pointer. - The next pointer needs a way to be changed to a "stop pointer" without erasing (so the "stop pointer" must have at least one bit set to 0), or it is impossible to remove one archive member (and setting the previous next pointer to "stop pointer") without erasing the sector containing the header of the last member before the erased member. Conclusion: All bits must be set to 1 in the "stop pointer" value for adding to work, but at least one bit must be set to 0 in the "stop pointer" value for erasing to work. That's a conflict.
This way we would not lock out the "bootblock" file in our lar search.
The "stop member" recognition could be disabled on a per-case basis.
Regards, Carl-Daniel
* Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net [080107 14:37]:
- Explicit free space covering lar member is fine, but to add any new
file to the archive, you have to erase the part covered by the exlusive space.
I don't think this is such a bit problem. Lar could remove this entry automatically, and recreate it. This is not more special code in lar than 1. But I start thinking that those two are pretty much the same in the end.
Yes, 1 and 2 are very similar. Your proposal is almost exactly what I had in mind. This "dummy" member would be removed once a file was added to the lar and the new entry would "overwrite" the dummy header. If we have an explicit size in the dummy header, we can point to the next header easily without having to stop. However, once we use an explicit size (setting some bits to 0), we have to erase the flash block to add a new entry in place of the dummy.
One plea though: What you describe usually only happens in a (currently) very rare case. That is the case where the "free" entry falls exactly on a flash block border.
Sure. We could have gaps to match up flash chip borders again. But this would re-introduce the problem of having to slowly walk a lot of space all the time.
Also, a potential update might as well exchange modules instead of just adding modules that were not there before. This will potentially shift all files in the lar, making it necessary to reflash the complete flash chip.
If we don't specify the size and keep all header entries (except magic) at value 0xff (all bits set), we still can write the new archive member without erasing because setting bits to 0 is always possible.
I am not convinced that safing flash erase cycles on bios updates is worth any design considerations. With one BIOS update a day, even deleting all flash blocks makes an average SST chip live at least 273 years. Do we need to extend that?
My proposed code change is not very flexible and really only serves as "end of interesting lar members" marker, but it is flash-friendly. Besides that, the code can be overridden so if you really want to look at the boot block, you can do so easily.
So that would be another runtime parameter?
What about a linked list in the header? Each file could explicitly point to the next file. If the pointer is 0 or -1, we stop.
Nice idea, but it may not work in all cases. Requirements for this to work cleanly are:
- The next pointer needs to have all bits set to 1 in the "stop pointer"
case, or it is impossible to add a member after the member with a stop pointer without erasing the sector containing the header with the stop pointer.
Oh my idea was to have the bootblock have a "next" member of -1. So there would always be a next member with 0 and 1 bits except for the last one (that will not allow you to put one behind it anyways, due to the reset vector on the x86 architecture)
Keep in mind that currently flashrom ALWAYS erases the complete flash chip except for some esoteric uses of the PMC PM49FL004.
No, I am not saying that we should keep a good concept out of LinuxBIOS just because one of the tools can not cope with this yet. But I fail to see the whole picture in the solution yet.
Stefan
New plan:
Scratch all "special" LAR member stuff and simply add a normal file called "foo" or "placeholder" with the nocompress: parameter to the LAR. No new code needed, gives us almost everything we need right now.
What do you think?
Regards, Carl-Daniel
Hi,
On Sun, Jan 20, 2008 at 10:31:23PM +0100, Carl-Daniel Hailfinger wrote:
New plan:
Scratch all "special" LAR member stuff and simply add a normal file called "foo" or "placeholder" with the nocompress: parameter to the LAR. No new code needed, gives us almost everything we need right now.
What do you think?
I have been thinking about the problem but I can't come up with any really simple solution that is without drawbacks.
So far I still prefer making sure that coreboot does not look for files that do not exist, rather than adding special files. But there are problems with that approache too. We want the lar to be dynamic so things could change after coreboot has been built and I agree with Ron that missing files, though we should make sure it's not a common case, shall still be pretty efficient.
I like the plan Carl-Daniel, it's a simple workaround that buys us time to think. Someone may come up with a new solution to all of this once flashrom can work with individual sectors.
//Peter
Peter Stuge wrote:
Hi,
On Sun, Jan 20, 2008 at 10:31:23PM +0100, Carl-Daniel Hailfinger wrote:
New plan:
Scratch all "special" LAR member stuff and simply add a normal file called "foo" or "placeholder" with the nocompress: parameter to the LAR. No new code needed, gives us almost everything we need right now.
What do you think?
I have been thinking about the problem but I can't come up with any really simple solution that is without drawbacks.
So far I still prefer making sure that coreboot does not look for files that do not exist, rather than adding special files. But there are problems with that approache too. We want the lar to be dynamic so things could change after coreboot has been built and I agree with Ron that missing files, though we should make sure it's not a common case, shall still be pretty efficient.
We should distinguish here between the files before and after raminit. I don't think we should have any false file lookups before we can start caching the rom.
I like the plan Carl-Daniel, it's a simple workaround that buys us time to think. Someone may come up with a new solution to all of this once flashrom can work with individual sectors.
//Peter
On Jan 7, 2008 1:40 AM, Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
Well, we can't use CPU cache, but we can abuse CAR memory as cache (ironic, isn't it? I'll call it CARAC.) for the first few (or all) members in the LAR. The cost estimate would be 8-32 bytes per LAR member, depending on your personal opinion on memory usage vs. speed tradeoff.
I would rather not walk all of empty space ever at any time. I like the end marker best. We can do it many ways, I leave it to you folks, but let's try to not look at 2^16-2^4 entries in search of something :-)
I still find the explicit termination marker very interesting, esp. since it is flash rewrite friendly.
Pointers, let's not do them, instead, if we want such a thing, let's do indices. If we start doing pointers, LAR is no longer location-independent.
thanks
ron
On 07.01.2008 17:40, ron minnich wrote:
On Jan 7, 2008 1:40 AM, Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
Well, we can't use CPU cache, but we can abuse CAR memory as cache (ironic, isn't it? I'll call it CARAC.) for the first few (or all) members in the LAR. The cost estimate would be 8-32 bytes per LAR member, depending on your personal opinion on memory usage vs. speed tradeoff.
I would rather not walk all of empty space ever at any time. I like the end marker best. We can do it many ways, I leave it to you folks, but let's try to not look at 2^16-2^4 entries in search of something :-)
My cache idea is completely orthogonal to the "walk empty space or not" issue. lib/lar.c would set up a cache of all LAR entries it encountered so far, making subsequent lookups almost pure cache memory operations.
I still find the explicit termination marker very interesting, esp. since it is flash rewrite friendly.
Pointers, let's not do them, instead, if we want such a thing, let's do indices. If we start doing pointers, LAR is no longer location-independent.
Agreed, but even for indices, the reasoning from my other mail applies. It is almost impossible to do without erasing unaffected flash sectors. There is one way to use indices sensibly without erasing the block containing the previous LAR header, but it will require an erase of the placeholder member. No-erase adding of LAR members with the requirement of no-collateral-erase-ever is not possible unless you allow me to add a dozen lines of code and uglify the central LAR parsing loop in a way that nobody ever is going to be able to verify the code. And I need 8 additional bytes in the header. There will be lots of special cases and people will have a strong urge to start v4 ;-)
Regards, Carl-Daniel
* ron minnich rminnich@gmail.com [080107 17:40]:
I still find the explicit termination marker very interesting, esp. since it is flash rewrite friendly.
With a 100k average possible erase cycles per flash block, I never managed to exceed the life time for a flash chip, before they would die physically (ie. bend pins etc)
The one reason why we don't want to erase the flash is because we don't want to have to rewrite the bootblock, risking to render the system unusable without special equipment.
But to achieve safety here we don't even need to touch the "previous" element if we want to add a new element to the lar. Because the previous would point to the "empty" file or marker, and that's exactly the position where a new file would be put.
Pointers, let's not do them, instead, if we want such a thing, let's do indices. If we start doing pointers, LAR is no longer location-independent.
Obviously we would not add the base address of the lar in memory when having a linked list but only keep the offset from the start of the lar. It's still a linked list.
On Sat, Jan 05, 2008 at 08:52:49PM -0800, ron minnich wrote:
On Jan 5, 2008 5:11 PM, Peter Stuge peter@stuge.se wrote:
On Sat, Jan 05, 2008 at 02:44:55PM -0800, ron minnich wrote:
NOT finding a file should be efficient.
Yes.
Where do the delay come from? Can anyone measure the LPC?
I will try to measure for you but remember it is running uncached. Each and every access is a full LPC cycle, which is not really fast.
True. It is definately LPC access time causing this.
On Sun, Jan 06, 2008 at 05:17:31PM +0100, Carl-Daniel Hailfinger wrote:
IIRC the PCEngines alix.1c has parallel flash
Nope. Both the soldered-on flash and obviously the LPC port are LPC.
for all boards with LPC-to-SPI translation through a IT8716F Super I/O chip, the following applies: My calculations suggest that for optimal reads (4 bytes at a 4 byte boundary) with the IT8716F SPI translation function, we need (1 byte opcode + 3 bytes addr + 4 bytes data)* 8 = 64 clock cycles on the SPI bus alone. With a maximum of 33 MHz for the SPI clock on that particular chip (MUST verify that the chip is set to 33 MHz and not 16 MHz!), this optimal read takes 2 usec. Smaller reads take at least 1.25 usec. On top of the pure SPI bus time we have to add the time to transmit and receive that data over LPC.
Say 5 us per byte. That's rather slow for us, yes.
On Mon, Jan 07, 2008 at 10:40:02AM +0100, Carl-Daniel Hailfinger wrote:
Well, we can't use CPU cache, but we can abuse CAR memory as cache
I like this.
- Explicit "stop searching here" marker.
Maybe. But I do not think this is very clean..
On Mon, Jan 07, 2008 at 08:40:31AM -0800, ron minnich wrote:
I would rather not walk all of empty space ever at any time.
I think the cost of one walk is ok for the flexibility gained.
I like the end marker best.
I think the runtime-created index is much more fun. It also keeps the lar format very clean.
In practice, the marker is the most efficient method, but it is a bit of an abomination IMO. :p
How do we handle the integrity issue when a single flash block that contains the start of a lar file is erased at runtime? (Thus breaking a link in the list.)
That re-introduces the walking cost if we want to be able to find any files beyond the erased block.
Maybe both run-time indexing and marker is good? But then there's no point in the marker. So only build the index when a hole is encountered.
//Peter
On Jan 10, 2008 5:38 PM, Peter Stuge peter@stuge.se wrote:
I think the runtime-created index is much more fun. It also keeps the lar format very clean.
Since I suffered the pain on a certain project for wasting milliseconds, I would rather not waste milliseconds. In this case, though, we're wasting hundreds and hundreds of milliseconds.
They're lost forever and never coming back.
In practice, the marker is the most efficient method, but it is a bit of an abomination IMO. :p
Performance hacks often are :-)
ron
On 11.01.2008 02:38, Peter Stuge wrote:
On Sat, Jan 05, 2008 at 08:52:49PM -0800, ron minnich wrote:
On Jan 5, 2008 5:11 PM, Peter Stuge peter@stuge.se wrote:
On Sat, Jan 05, 2008 at 02:44:55PM -0800, ron minnich wrote:
NOT finding a file should be efficient.
Yes.
Where do the delay come from? Can anyone measure the LPC?
I will try to measure for you but remember it is running uncached. Each and every access is a full LPC cycle, which is not really fast.
True. It is definately LPC access time causing this.
Yes.
On Sun, Jan 06, 2008 at 05:17:31PM +0100, Carl-Daniel Hailfinger wrote:
IIRC the PCEngines alix.1c has parallel flash
Nope. Both the soldered-on flash and obviously the LPC port are LPC.
Even LPC alone is faster than LPC-to-slow-SPI-to-LPC.
for all boards with LPC-to-SPI translation through a IT8716F Super I/O chip, the following applies: My calculations suggest that for optimal reads (4 bytes at a 4 byte boundary) with the IT8716F SPI translation function, we need (1 byte opcode + 3 bytes addr + 4 bytes data)* 8 = 64 clock cycles on the SPI bus alone. With a maximum of 33 MHz for the SPI clock on that particular chip (MUST verify that the chip is set to 33 MHz and not 16 MHz!), this optimal read takes 2 usec. Smaller reads take at least 1.25 usec. On top of the pure SPI bus time we have to add the time to transmit and receive that data over LPC.
Say 5 us per byte. That's rather slow for us, yes.
Hopefully that's the worst case. Still horrible if we use 1 MByte ROMs - 5 seconds for reading the whole ROM is unacceptable.
On Mon, Jan 07, 2008 at 10:40:02AM +0100, Carl-Daniel Hailfinger wrote:
Well, we can't use CPU cache, but we can abuse CAR memory as cache
I like this.
I'll try to code up some generic LAR caching extension over the weekend unless someone beats me to it. (Hint, hint!)
- Explicit "stop searching here" marker.
Maybe. But I do not think this is very clean..
Not very clean, but extremely flash-friendly. And in contrast to a "skip n bytes" marker you can add a new LAR member in its place without erasing the marker (at least that's how my implementation performs).
On Mon, Jan 07, 2008 at 08:40:31AM -0800, ron minnich wrote:
I would rather not walk all of empty space ever at any time.
I think the cost of one walk is ok for the flexibility gained.
We have to make sure it is never more than one full walk. Everything else is embarrassing.
I like the end marker best.
I think the runtime-created index is much more fun. It also keeps the lar format very clean.
We can do both. It seems that at least for the runtime-created index/cache we all agree.
In practice, the marker is the most efficient method, but it is a bit of an abomination IMO. :p
Big question: Is v3 a matter of pride or speed or userfriendliness? I haven't decided yet. ;-)
How do we handle the integrity issue when a single flash block that contains the start of a lar file is erased at runtime? (Thus breaking a link in the list.)
If we really do erases at LB runtime, the person waiting for the erase surely has the time to rebuild the index/cache (which will probably need a fraction of the time needed for erase).
That re-introduces the walking cost if we want to be able to find any files beyond the erased block.
Maybe both run-time indexing and marker is good? But then there's no point in the marker. So only build the index when a hole is encountered.
Avoiding LPC cycles is always a reason to build an index/cache. L2 cache is orders of magnitude faster than LPC. For n LAR members, a failed lookup will either cost you ~n*20 LPC transactions with 32 bit each or n*2 L2 cache accesses. I'd claim a factor of 1000 speed improvement.
Regards, Carl-Daniel
* Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net [080111 03:07]:
Hopefully that's the worst case. Still horrible if we use 1 MByte ROMs - 5 seconds for reading the whole ROM is unacceptable.
We will have to bear with that if we plan to check the rom checksum at some point. This feature should definitely be an option though, given the slow speed of the flashes.
Stefan
Carl-Daniel Hailfinger wrote:
Hopefully that's the worst case. Still horrible if we use 1 MByte ROMs - 5 seconds for reading the whole ROM is unacceptable.
Sort of tangential, but some asked for some measurements of speed of reads on 5536. I did some tests reading 64KB sequentially from an SST49LF002A on a 500MHz Geode LX platform. The numbers are all scaled up to time to read 1MB:
905ms - normal config 897ms - with a 5536 PCI region config added for the ROM region 738ms - with the region config set to prefetchable also
This was just a rep movsd from the ROM to another memory location.
So that is about 1us/byte, getting better with better configuration.
Turning on a 5536 region config has the effect of making the reads get decoded medium instead of subtractive on PCI. Prefetch will let the 5536 use its PCI buffers to prefetch data from the ROM, so it has a big effect on sequential reads.