Hello, all.
I've been playing around with buspirate and my uISP ( https://github.com/uISP/ ) dongle. Even with newer firmware buspirate is really slow, and on average, takes 3m 49s to read out a 2MiB SPI flash.
So I decided to play around and see if I can make a faster programmer with my uISP dongle. Basically it's an atmega8 with 12M crystal running vusb stack (no hardware usb). Any other similar hardware would do. This hardware is really cheap (BOM is less than 5$ including the pcb).
My first attempt was creating a serprog-compatible firmware for uISP. It can be found here:
https://github.com/uISP/uisp-app-serprog
The results weren't very exciting. It took 4m 30s to read out a 2MiB SPI flash (EN25QH16) on 12M crystal. Same for 20M crystal. It seems like it was a limitation of bulk out transfers via vusb.
The second attempt required patching flashrom itself, but resulted in a MUCH simpler firmware that fits roughly 115 lines of code (not counting vusb). It just uses control transfers for both reads and writes. With vusb long transfers enabled, 20Mhz crystal, 10Mhz SPI speed and max_data_read/max_data_write set to 4096 bytes reading a 2MiB SPI flash took only 2m 13s which I consider a WIN.
4096 is not a vusb limit. USB spec limits maximum transfers to control endpoint to 4096 bytes anyway.
Using 254 byte max_data_read/max_data_write and long transfers disabled would be slower, but it is possible to fit everything into 2K flash and use some attiny2313, which will make it even cheaper. The protocol's simple as hell, so anyone with a better hardware around (STM32, atmega8u2, pic32, whatever) can implement it with very little effort and get lighting-fast speeds. My WIP patch to flashrom's attached, although I have a few silly questions left:
* Do I need to implement SPI frequency changing via a dedicated command or is it okay to hardcode it to 10Mhz in firmware? * Right now I'm using the same spi_read as serprog does. Is okay, or should I add an option to disable it? * firmware version checking, etc. (Is it a good idea to implement?) * Is there a better way to benchmark read/write speed in flashrom, since delay loop calibration interferes with speed measurements (currently I just used "time flashrom ...")
On Sun, 25 May 2014 17:49:04 +0400 Andrew andrew@ncrmnt.org wrote:
Hello, all.
I've been playing around with buspirate and my uISP ( https://github.com/uISP/ ) dongle. Even with newer firmware buspirate is really slow, and on average, takes 3m 49s to read out a 2MiB SPI flash.
So I decided to play around and see if I can make a faster programmer with my uISP dongle. Basically it's an atmega8 with 12M crystal running vusb stack (no hardware usb). Any other similar hardware would do. This hardware is really cheap (BOM is less than 5$ including the pcb).
My first attempt was creating a serprog-compatible firmware for uISP. It can be found here:
https://github.com/uISP/uisp-app-serprog
The results weren't very exciting. It took 4m 30s to read out a 2MiB SPI flash (EN25QH16) on 12M crystal. Same for 20M crystal. It seems like it was a limitation of bulk out transfers via vusb.
The second attempt required patching flashrom itself, but resulted in a MUCH simpler firmware that fits roughly 115 lines of code (not counting vusb). It just uses control transfers for both reads and writes. With vusb long transfers enabled, 20Mhz crystal, 10Mhz SPI speed and max_data_read/max_data_write set to 4096 bytes reading a 2MiB SPI flash took only 2m 13s which I consider a WIN.
4096 is not a vusb limit. USB spec limits maximum transfers to control endpoint to 4096 bytes anyway.
Using 254 byte max_data_read/max_data_write and long transfers disabled would be slower, but it is possible to fit everything into 2K flash and use some attiny2313, which will make it even cheaper. The protocol's simple as hell, so anyone with a better hardware around (STM32, atmega8u2, pic32, whatever) can implement it with very little effort and get lighting-fast speeds. My WIP patch to flashrom's attached, although I have a few silly questions left:
- Do I need to implement SPI frequency changing via a dedicated command
or is it okay to hardcode it to 10Mhz in firmware?
- Right now I'm using the same spi_read as serprog does. Is okay, or
should I add an option to disable it?
- firmware version checking, etc. (Is it a good idea to implement?)
- Is there a better way to benchmark read/write speed in flashrom, since
delay loop calibration interferes with speed measurements (currently I just used "time flashrom ...")
Hi,
as you are probably aware of, we have quite some backlog of open patches and yours is about the exact opposite of easy to deal with. :) Please don't expect a detailed review any time soon. That said, thank you very much for sending your patch (and your open hardware :). We would certainly like to include a module for your programmer eventually.
I'll try to quickly answer your questions:
- SPI frequency is rather unimportant and adding an interface to change it can always be amended later IMHO. - I don't understand your question fully. You have copied the function from serprog AFAICS. That's OK for now. - serprog supports version checks but we had no need to use it yet(?) because we only added features (and opcodes) but did not refine existing functions in non-compatible ways. I have not looked at your protocol so I can't say if it is more likely to need it later. In general I think it would be a good idea to add this rather early. - No benchmark functionality yet. If the calibration loop or anything else really makes any difference than your performance is good enough already IMHO ;) You could easily add some timekeeping to doit() and print the difference between two time stamps at the end, if you want more precession.
HTH for now, please keep us updated about your progress and thanks again.
Stefan Tauner wrote on 01.06.2014 04:33:
On Sun, 25 May 2014 17:49:04 +0400 Andrew andrew@ncrmnt.org wrote:
Hello, all.
I've been playing around with buspirate and my uISP ( https://github.com/uISP/ ) dongle. Even with newer firmware buspirate is really slow, and on average, takes 3m 49s to read out a 2MiB SPI flash.
So I decided to play around and see if I can make a faster programmer with my uISP dongle. Basically it's an atmega8 with 12M crystal running vusb stack (no hardware usb). Any other similar hardware would do. This hardware is really cheap (BOM is less than 5$ including the pcb).
My first attempt was creating a serprog-compatible firmware for uISP. It can be found here:
https://github.com/uISP/uisp-app-serprog
The results weren't very exciting. It took 4m 30s to read out a 2MiB SPI flash (EN25QH16) on 12M crystal. Same for 20M crystal. It seems like it was a limitation of bulk out transfers via vusb.
The second attempt required patching flashrom itself, but resulted in a MUCH simpler firmware that fits roughly 115 lines of code (not counting vusb). It just uses control transfers for both reads and writes. With vusb long transfers enabled, 20Mhz crystal, 10Mhz SPI speed and max_data_read/max_data_write set to 4096 bytes reading a 2MiB SPI flash took only 2m 13s which I consider a WIN.
4096 is not a vusb limit. USB spec limits maximum transfers to control endpoint to 4096 bytes anyway.
Using 254 byte max_data_read/max_data_write and long transfers disabled would be slower, but it is possible to fit everything into 2K flash and use some attiny2313, which will make it even cheaper. The protocol's simple as hell, so anyone with a better hardware around (STM32, atmega8u2, pic32, whatever) can implement it with very little effort and get lighting-fast speeds. My WIP patch to flashrom's attached, although I have a few silly questions left:
- Do I need to implement SPI frequency changing via a dedicated
command or is it okay to hardcode it to 10Mhz in firmware?
- Right now I'm using the same spi_read as serprog does. Is okay, or
should I add an option to disable it?
- firmware version checking, etc. (Is it a good idea to implement?)
- Is there a better way to benchmark read/write speed in flashrom,
since delay loop calibration interferes with speed measurements (currently I just used "time flashrom ...")
Hi,
as you are probably aware of, we have quite some backlog of open patches and yours is about the exact opposite of easy to deal with. :) Please don't expect a detailed review any time soon. That said, thank you very much for sending your patch (and your open hardware :). We would certainly like to include a module for your programmer eventually.
I'll try to quickly answer your questions:
- SPI frequency is rather unimportant and adding an interface to change it can always be amended later IMHO.
- I don't understand your question fully. You have copied the function from serprog AFAICS. That's OK for now.
- serprog supports version checks but we had no need to use it yet(?) because we only added features (and opcodes) but did not refine existing functions in non-compatible ways. I have not looked at your protocol so I can't say if it is more likely to need it later. In general I think it would be a good idea to add this rather early.
- No benchmark functionality yet. If the calibration loop or anything else really makes any difference than your performance is good enough already IMHO ;) You could easily add some timekeeping to doit() and print the difference between two time stamps at the end, if you want more precession.
HTH for now, please keep us updated about your progress and thanks again.
Thanks for the reply. I will be refining and documenting the protocol and hardware shortly and will resend the patch this week along with a more detailed writeup, pictures and benchmarks.
So far the changes implemented are: * Protocol versioning support * Programmer now reports CPU frequency and maximum SPI frequency * SPI Frequency setting (for now with a programmer param) * Multi-CS support (if the programmer has several flashes attached to different chip-selects) * Any ideas what else to implement? * Programmer lock/unlock (Another accidentally started instance of flashrom won't screw us up anymore)
I will be also making some new hardware based on STM32, that should provide a real speed boost, but since the actual chips are stuck somewhere between China and Russia this may take a while. My goal is to make a blazing fast programmer with BOM cost of under 5$ (and possibly parallel flash programming support as well).
On Mon, 02 Jun 2014 11:35:31 +0400 Andrew andrew@ncrmnt.org wrote:
- Programmer lock/unlock (Another accidentally started instance of
flashrom won't screw us up anymore)
chromiumos' flashrom has something like that implemented globally IIRC. It does not make too much sense to implement that for a single programmer... could two instances get access to the interface concurrently at all?
I will be also making some new hardware based on STM32, that should provide a real speed boost, but since the actual chips are stuck somewhere between China and Russia this may take a while.
I have a discovery board laying around with an STM32F4... but have never plugged it in yet :/
My goal is to make a blazing fast programmer with BOM cost of under 5$ (and possibly parallel flash programming support as well).
You might be interested in http://qiprog.org/ The GSoC student that made it last year sadly joined the US armed forces, so don't expect too many answers regarding it from him. Peter Stuge was the mentor and is probably the one to ask if you have questions not answered by the website/code or last year's blog articles http://blogs.coreboot.org/blog/author/mrnuke/
Stefan Tauner писал 02.06.2014 21:02:
On Mon, 02 Jun 2014 11:35:31 +0400 Andrew andrew@ncrmnt.org wrote:
- Programmer lock/unlock (Another accidentally started instance of
flashrom won't screw us up anymore)
chromiumos' flashrom has something like that implemented globally IIRC. It does not make too much sense to implement that for a single programmer... could two instances get access to the interface concurrently at all?
Okay, I'll drop it then. Thanks. For an stm32-based one it can be possible and makes sense to concurrently interact with several SPI chips connected to several hardware SPI interfaces e.g. stm32 has the guts to do so without any slowdowns, so I think of implementing this feature in the core code, yet leave it out for avr.
For STM32 it might also make more sense to export a /dev/ttyACMx for each SPI interface and use serprog protocol, but that needs testing. On avr vusb-based cdc_acm is damn slow for device->host transfers and doesn't work for some usb hosts. And I'm not sure it will be the fastest way on a stm32 either.
STM32 right now requires love, since ST's usb device stack is utter unusable crap and definitely has a race condition somewhere that causes enumeration problems. I'm playing with it for my pet antares project ( http://github.com/nekromant/antares ), but ST's code needs a lot of refactoring before it can be used at all (or even a rewrite from scratch).
I will be also making some new hardware based on STM32, that should provide a real speed boost, but since the actual chips are stuck somewhere between China and Russia this may take a while.
I have a discovery board laying around with an STM32F4... but have never plugged it in yet :/
Actually, you can use even an stm32f1 discovery as a flashrom programmer with stlink as the interface ;) The stlink will deliver ~25Kb/sec speeds. I've done this a while ago using this hack of mine: http://ncrmnt.org/wp/2013/05/06/stlink-as-a-serial-terminal/ when I had to urgently debrick a router. But it was utter hackery and I didn't bother to send in the patch to flashrom. ST also has COMPLETELY different usb device stacks for stm32f4x and stm32f1x with COMPLETELY different API. (And both look weird as hell.) For now I target stm32f1x.
My goal is to make a blazing fast programmer with BOM cost of under 5$ (and possibly parallel flash programming support as well).
You might be interested in http://qiprog.org/ The GSoC student that made it last year sadly joined the US armed forces, so don't expect too many answers regarding it from him. Peter Stuge was the mentor and is probably the one to ask if you have questions not answered by the website/code or last year's blog articles http://blogs.coreboot.org/blog/author/mrnuke/
Thanks for the links, I'll have a look. But I guess it will be faster for me to design it from scratch, I guess. Not a big deal, anyway.