My question is basically what happens when bytesout and bytesin are both non-zero. My previous understanding was always that the SPI controller
It's up to the spi controller itself. They should support full duplex, if possible.
And if not? The problem is that right now xfer() means two different things depending on which controller implementation you talk to. Isn't that a pretty bad situation? The point of having an API is that you can program something without having to care (too much) what implements it on the other side. The behavior of the same call shouldn't change based on the implementation, other than aborting for requests that the implementation cannot support.
I think if we want to allow full-duplex, then any controllers that don't support full-duplex should error out if their xfer() is called with two buffers. They shouldn't just do something different than what the controllers which support full-duplex would do if they were called the same way. If the caller wants to transfer things one after the other, it needs to use xfer_vector() with two vectors or do two separate xfer() calls.
cmd *then* response behavior. As you noted you'd have to eat the cmd bytes in the receive buffer. The issue is that we are dealing with 2 different types of spi *flash* controllers:
- generic spi controller that can do any normal spi thing
- spi flash controllers on x86 systems. The ones on x86 systems are
not a normal spi controller, and they can't really be treated like one.
The spi flash drivers previously made assumptions around #2 in that they would/could send din/dout in a full duplex manner assuming #2 semantics -- basically very not normal spi.
Right. So I think the way spi_flash.c is written right now looks good, if we want to consider transfers full-duplex. do_spi_flash_cmd() correctly builds a vector with two separate transactions, one out and one in. My problem is just that the Intel controllers then implement xfer_vector() with spi_xfer_two_vectors() which just merges it back together into a single xfer() call, with different semantics than what the other (full-duplex-supporting) controllers use for xfer(). That just seems wrong... xfer() is an external part of the SPI controller API so it should have the same semantics for all controllers. If the Intel flash controllers want to do something special with what do_spi_flash_cmd() gives them, they should have their own custom implementation of xfer_vector() that doesn't call xfer() and instead calls something internal to them which implements this behavior, while xfer() should conform to the standard API (as best as it can).
mediatek does that for reasons I'm not aware of.
I think they just copied off the wrong implementation when writing that driver and I didn't notice during review. Like I'm saying, I think the current situation is very confusing, especially to those vendors who see it for the first time. The MediaTek controller is definitely capable of full-duplex AFAIK, the driver is written in a way that explicitly avoids being full-duplex right now.
That's just to combine the two ops spi flash cmds follow. Those helpers are there only for that purpose. If you look at the spi flash API buffers are passed in for getting data back, etc. There's no where to eat the setup cmd pieces w/o sending two ops down to the controller aside from realloc'ing a buffer with the larger size then memcpy()'ing back into the original buffer. Otherwise you are putting the burden on the higher layer callers to somehow know they need to pad their buffers. Also, the x86 spi flash controllers need all the info at once as a fused op.
Right, I get that those x86 flash controllers need to treat this specially. I just think their xfer_vector() implementation shouldn't go back and call xfer() like that (in a way that xfer() is normally not meant to work). Their xfer_vector() should do the operation directly, and their xfer() should be a different implementation that follows the normal API for xfer() as best as it can (or if they can't do that at all they just shouldn't provide an xfer() callback).
One could never connect a generic spi device to an x86 spi *flash* controller. That's similarly why is the special function flash_probe() in spi_ctrlr. It's to bypass the spi driver probing because it's not possible to probe the way the drivers are written.
Okay. It's fair that some controllers can only support very special use cases. But the spi_ctrlr API overall is still a generic SPI API that may be used for flash and for other things, so I think xfer() and xfer_vector() always need to have the same semantics for everything that implements the API. If a controller cannot implement anything that's not an xfer_vector with exactly two transfers (one out and one in), it needs to return an error if it's invoked in any other way. It shouldn't just do something that's not conforming to the way other controllers would treat that call instead.
Speaking with my flashrom hat on, it's a bit more complicated.
Duplex perspective: There are two main types (and a few subtypes) of SPI controllers used for flash: 1.) Full-duplex controllers: Capable of sending and receiving at the same time 1.a) For each clock cycle, a bit is sent and a bit is received at the same time (e.g. Raspberry Pi SPI) 1.b) For the first 8 clock cycles, a bit is sent. After that, for each clock cycle, a bit is sent and a bit is received at the same time (e.g. AMD SB600 and successors) 1.c) After a defined number of clock cycles, a constant number of bits is ignored in both directions (e.g. 8 bits ignored for FAST READ command, supported by some controllers) 2.) Half-duplex controllers: Capable of either sending or receiving at any given time 2.a) Send-then-receive: A number of bits (>0) is sent, and after that a number of bits (>=0) is received (e.g. Intel SPI) 2.b) After a defined number of clock cycles, a constant number of bits is ignored in both directions (e.g. 8 bits ignored for FAST READ command, supported by some controllers)
I think from this perspective, the coreboot API should always assume 1.a. All other controllers need to interpret the API requests that way and try to fulfill them with what they have. For example, that AMD controller that can do full duplex for all but the first 8 cycles (1.b) would have to have a custom xfer_vector() implementation that checks if it was given two vectors, where the first one has 8 bytes out with no bytes in, and the second one has N bytes out and N bytes in. It could fulfill that, and for other requests that it can't fulfill it would have to return an error.
Programming perspective:
- Single-transaction programming: Send+receive count as well as send
buffer and optionally receive buffer supplied in a single call, CS (chip select) happens automatically at the beginning and end of the transaction
This model can't use the normal SPI API at all and instead has to implement a custom flash_probe() function where it can install its own spi_flash_ops object. We have some controllers like that already.
- Separate CS programming, single-transaction buffer handling: Activate
CS, send+receive count as well as send buffer and optionally receive buffer supplied in a single call, deactivate CS
This one can use the SPI API and only has to implement xfer(). spi-generic.c will automatically turn spi_xfer_vector() calls into multiple invocations of xfer() for it.
- Separate CS programming, two-transaction buffer handling: Activate
CS, send count+buffer in one call, optional receive count+buffer in one call, deactivate CS 4. Separate CS programming, multi-transaction buffer handling: Activate CS, send count+buffer in multiple calls, optional receive count+buffer in multiple calls, deactivate CS
Both of these can use the SPI API but need to implement their own xfer_vector(). In there they could check whether the vectors they were given match the transaction model they support, and if not they would have to return an error. They should not implement the normal xfer() callback at all, unless they can support a transaction with a single buffer.
An SPI ROM read transaction packed into a single xfer() call (command
in
dout, data read into din) would work fine with the second
interpretation
but return garbage (first 4 bytes dropped and everything else shifted)
with
the first interpretation.
The other way round: It would return 4 bytes of garbage, then everything but the last 4 bytes shifted, and the last 4 bytes woud be dropped.
Right, thanks, it's that way around.