Sorry to interrupt, but it seems that there are conflicting definitions of full-duplex...
On 19.04.2018 05:39, Aaron Durbin via coreboot wrote:
On Wed, Apr 18, 2018 at 8:46 PM, Julius Werner jwerner@chromium.org wrote:
I was trying to understand how to best implement the coreboot SPI API for a new SoC, and as I was reading into it I got increasingly confused. This has changed a bit last year (with the addition of struct spi_ctrlr), and I don't think it was fully consistent before that across all implementations either. Basically, we've always had the following base transfer prototype (used to be spi_xfer(), now a callback in struct spi_ctrlr):
int (*xfer)(const struct spi_slave *slave, const void *dout, size_t
bytesout, void *din, size_t bytesin);
There's also a more recent addition that can handle multiple transfer "vectors" at once, and the code will fall back to the basic xfer() callback if a controller doesn't implement this:
struct spi_op { const void *dout; size_t bytesout; void *din; size_t bytesin; enum spi_op_status status; }; int (*xfer_vector)(const struct spi_slave *slave, struct spi_op
vectors[], size_t count);
My question is basically what happens when bytesout and bytesin are both non-zero. My previous understanding was always that the SPI controller
It's up to the spi controller itself. They should support full duplex, if possible.
Are you aware of any flash chip supporting full-duplex operation?
should then operate in full-duplex mode, clocking out bytes from the dout array while reading bytes into the din array in the same cycle. AFAIK no driver in coreboot actually wants to use the controller this way, but it is a legitimate way to operate a SPI bus and it's not impossible that we may one day have a need for it. The Rockchip, Samsung and (I believe, those are a little harder to read) Nvidia SoC drivers are implemented this way. There is also the following comment (and associated implemented code) in spi_flash.c:do_spi_flash_cmd() which makes me think that this understanding was at least at some point the intended one:
/* * SPI flash requires command-response kind of behavior. Thus, two * separate SPI vectors are required -- first to transmit dout and
other * to receive in din. If some specialized SPI flash controllers * (e.g. x86) can perform both command and response together, it should * be handled at SPI flash controller driver level. */ <...array with two spi_op structs, one having only din and one only dout...>
cmd *then* response behavior. As you noted you'd have to eat the cmd bytes in the receive buffer. The issue is that we are dealing with 2 different types of spi *flash* controllers:
- generic spi controller that can do any normal spi thing
- spi flash controllers on x86 systems. The ones on x86 systems are
not a normal spi controller, and they can't really be treated like one.
Speaking with my flashrom hat on, it's a bit more complicated.
Duplex perspective: There are two main types (and a few subtypes) of SPI controllers used for flash: 1.) Full-duplex controllers: Capable of sending and receiving at the same time 1.a) For each clock cycle, a bit is sent and a bit is received at the same time (e.g. Raspberry Pi SPI) 1.b) For the first 8 clock cycles, a bit is sent. After that, for each clock cycle, a bit is sent and a bit is received at the same time (e.g. AMD SB600 and successors) 1.c) After a defined number of clock cycles, a constant number of bits is ignored in both directions (e.g. 8 bits ignored for FAST READ command, supported by some controllers) 2.) Half-duplex controllers: Capable of either sending or receiving at any given time 2.a) Send-then-receive: A number of bits (>0) is sent, and after that a number of bits (>=0) is received (e.g. Intel SPI) 2.b) After a defined number of clock cycles, a constant number of bits is ignored in both directions (e.g. 8 bits ignored for FAST READ command, supported by some controllers)
Programming perspective: 1. Single-transaction programming: Send+receive count as well as send buffer and optionally receive buffer supplied in a single call, CS (chip select) happens automatically at the beginning and end of the transaction 2. Separate CS programming, single-transaction buffer handling: Activate CS, send+receive count as well as send buffer and optionally receive buffer supplied in a single call, deactivate CS 3. Separate CS programming, two-transaction buffer handling: Activate CS, send count+buffer in one call, optional receive count+buffer in one call, deactivate CS 4. Separate CS programming, multi-transaction buffer handling: Activate CS, send count+buffer in multiple calls, optional receive count+buffer in multiple calls, deactivate CS
The spi flash drivers previously made assumptions around #2 in that they would/could send din/dout in a full duplex manner assuming #2 semantics -- basically very not normal spi.
What you describe sounds like single-transaction half-duplex.
However, most/all(?) Intel drivers and the Mediatek one seem to instead implement a "one after the other" behavior: the controller first clocks out
x86 spi *flash* controllers do that for sure, but that's to handle true spi flash operations that are essentially two transactions. mediatek does that for reasons I'm not aware of.
x86 AMD SB600 (and later) SPI flash controllers perform the first byte half-duplex and the rest full-duplex.
SPI flash operations are single transactions on the bus (CS stays active between the sending and receiving). Quite a few APIs seem to struggle with that, see the Linux SPI API for example.
the dout array (discarding any bits received on MISO during this time), and only once that is done and bytesout clock cycles have passed does it start reading bytes into the din array for another bytesin cycles (clocking out zeroes or some arbitrary value on MOSI during that time). There's also the global spi_xfer_two_vectors() helper used by most of the above-mentioned controllers (as their xfer_vector() callback) which combines two struct spi_op vectors into a single xfer() call iff the first one only has a dout and the second one only a din.
That's just to combine the two ops spi flash cmds follow. Those helpers are there only for that purpose. If you look at the spi flash API buffers are passed in for getting data back, etc. There's no where to eat the setup cmd pieces w/o sending two ops down to the controller aside from realloc'ing a buffer with the larger size then memcpy()'ing back into the original buffer. Otherwise you are putting the burden on the higher layer callers to somehow know they need to pad their buffers. Also, the x86 spi flash controllers need all the info at once as a fused op.
Clearly, we shouldn't just let every controller decide on its own how to interpret a transfer command, because they mean two very different things.
One could never connect a generic spi device to an x86 spi *flash* controller. That's similarly why is the special function flash_probe() in spi_ctrlr. It's to bypass the spi driver probing because it's not possible to probe the way the drivers are written.
Are you referring to the painful limitations of some flash controllers which do not support reading or writing an arbitrary number of bytes between zero (receiving) or one (sending) and the controller limit?
An SPI ROM read transaction packed into a single xfer() call (command in dout, data read into din) would work fine with the second interpretation but return garbage (first 4 bytes dropped and everything else shifted) with the first interpretation.
The other way round: It would return 4 bytes of garbage, then everything but the last 4 bytes shifted, and the last 4 bytes woud be dropped.
A transaction from a full duplex driver that really needs to send and receive at the same time wouldn't be possible with the second interpretation at all.
See my note about no way of being able to do that w/o allocating buffers to eat unneeded bytes. One could introduce offset into spi_op but that just makes things more complicated for the controller implementation. It's easier to just chunk the cmd-resp pair into a two requests.
If you want to handle the FAST READ flash command, you need to drop a number of clock cycles (usually 8 clocks for 1 byte in the single-IO case).
I think we need to make a decision how we want the API to work, clearly document that to remove future confusion, and then we can maybe simplify
It handles all the scenarios. Implement xfer() and things will work fine.
some parts a little. If we decide that full-duplex is correct then I think that spi_xfer_two_vectors() function needs to go because it's just clearly doing the wrong thing in that case. If some controllers require to do the
We should rename it and/or make it clearer what the intended usage is. It's a helper for spi *flash* controllers that can't really deal with real spi semantics.
out and in of a SPI flash operation in one go, they'll have to implement xfer_vector() themselves and not just pipe it through to xfer(). If a controller doesn't support full-duplex transfers, xfer() should give a clear error if both din and dout are provided, not just do the wrong thing.
spi_xfer_vector() already exists and works as one would expect.
If we decide that "first out, then in" is the intended meaning, the remaining controllers that implement full-duplex should be changed to work that way. After that we could simplify the SPI flash code because it can just do a read operation with a single xfer(). We could probably get rid of the whole xfer_vector() callback then because it seems that its only used by the flash driver to coalesce command and response right now, and all controllers that implement it just use the spi_xfer_two_vectors() helper.
Have you tried looking at how flashrom does it?
I don't think all generic spi controllers use that at all. Only the ones that are hardware limited for spi flash or the driver is limited by implementation.
But if we do it this way, we'd never be able to support full-duplex drivers if we ever needed them.
If we ever need full-duplex drivers for flash, the flash command set will have changed. If you want to use SPI as high-speed bidirectional serial debugging line, full-duplex may be useful in some cases. If you want to use SPI for attaching other peripherals, full-duplex may be a requirement (but probably not in coreboot unless those peripherals have to be initialized).
Regards, Carl-Daniel