I was trying to understand how to best implement the coreboot SPI API for a new SoC, and as I was reading into it I got increasingly confused. This has changed a bit last year (with the addition of struct spi_ctrlr), and I don't think it was fully consistent before that across all implementations either. Basically, we've always had the following base transfer prototype (used to be spi_xfer(), now a callback in struct spi_ctrlr):
int (*xfer)(const struct spi_slave *slave, const void *dout, size_t bytesout, void *din, size_t bytesin);
There's also a more recent addition that can handle multiple transfer "vectors" at once, and the code will fall back to the basic xfer() callback if a controller doesn't implement this:
struct spi_op { const void *dout; size_t bytesout; void *din; size_t bytesin; enum spi_op_status status; }; int (*xfer_vector)(const struct spi_slave *slave, struct spi_op vectors[], size_t count);
My question is basically what happens when bytesout and bytesin are both non-zero. My previous understanding was always that the SPI controller should then operate in full-duplex mode, clocking out bytes from the dout array while reading bytes into the din array in the same cycle. AFAIK no driver in coreboot actually wants to use the controller this way, but it is a legitimate way to operate a SPI bus and it's not impossible that we may one day have a need for it. The Rockchip, Samsung and (I believe, those are a little harder to read) Nvidia SoC drivers are implemented this way. There is also the following comment (and associated implemented code) in spi_flash.c:do_spi_flash_cmd() which makes me think that this understanding was at least at some point the intended one:
/* * SPI flash requires command-response kind of behavior. Thus, two * separate SPI vectors are required -- first to transmit dout and other * to receive in din. If some specialized SPI flash controllers * (e.g. x86) can perform both command and response together, it should * be handled at SPI flash controller driver level. */ <...array with two spi_op structs, one having only din and one only dout...>
However, most/all(?) Intel drivers and the Mediatek one seem to instead implement a "one after the other" behavior: the controller first clocks out the dout array (discarding any bits received on MISO during this time), and only once that is done and bytesout clock cycles have passed does it start reading bytes into the din array for another bytesin cycles (clocking out zeroes or some arbitrary value on MOSI during that time). There's also the global spi_xfer_two_vectors() helper used by most of the above-mentioned controllers (as their xfer_vector() callback) which combines two struct spi_op vectors into a single xfer() call iff the first one only has a dout and the second one only a din.
Clearly, we shouldn't just let every controller decide on its own how to interpret a transfer command, because they mean two very different things. An SPI ROM read transaction packed into a single xfer() call (command in dout, data read into din) would work fine with the second interpretation but return garbage (first 4 bytes dropped and everything else shifted) with the first interpretation. A transaction from a full duplex driver that really needs to send and receive at the same time wouldn't be possible with the second interpretation at all.
I think we need to make a decision how we want the API to work, clearly document that to remove future confusion, and then we can maybe simplify some parts a little. If we decide that full-duplex is correct then I think that spi_xfer_two_vectors() function needs to go because it's just clearly doing the wrong thing in that case. If some controllers require to do the out and in of a SPI flash operation in one go, they'll have to implement xfer_vector() themselves and not just pipe it through to xfer(). If a controller doesn't support full-duplex transfers, xfer() should give a clear error if both din and dout are provided, not just do the wrong thing.
If we decide that "first out, then in" is the intended meaning, the remaining controllers that implement full-duplex should be changed to work that way. After that we could simplify the SPI flash code because it can just do a read operation with a single xfer(). We could probably get rid of the whole xfer_vector() callback then because it seems that its only used by the flash driver to coalesce command and response right now, and all controllers that implement it just use the spi_xfer_two_vectors() helper. But if we do it this way, we'd never be able to support full-duplex drivers if we ever needed them.