[coreboot] How is our SPI API supposed to work?

Thu Apr 19 23:22:37 CEST 2018

Sorry to interrupt, but it seems that there are conflicting definitions
of full-duplex...

On 19.04.2018 05:39, Aaron Durbin via coreboot wrote:
> On Wed, Apr 18, 2018 at 8:46 PM, Julius Werner <jwerner at chromium.org> wrote:
>> I was trying to understand how to best implement the coreboot SPI API for a
>> new SoC, and as I was reading into it I got increasingly confused. This has
>> changed a bit last year (with the addition of struct spi_ctrlr), and I
>> don't think it was fully consistent before that across all implementations
>> either. Basically, we've always had the following base transfer prototype
>> (used to be spi_xfer(), now a callback in struct spi_ctrlr):
>>
>>      int (*xfer)(const struct spi_slave *slave, const void *dout, size_t
>> bytesout, void *din, size_t bytesin);
>>
>> There's also a more recent addition that can handle multiple transfer
>> "vectors" at once, and the code will fall back to the basic xfer() callback
>> if a controller doesn't implement this:
>>
>>      struct spi_op {
>>              const void *dout;
>>              size_t bytesout;
>>              void *din;
>>              size_t bytesin;
>>              enum spi_op_status status;
>>      };
>>      int (*xfer_vector)(const struct spi_slave *slave, struct spi_op
>> vectors[], size_t count);
>>
>> My question is basically what happens when bytesout and bytesin are both
>> non-zero. My previous understanding was always that the SPI controller
> It's up to the spi controller itself. They should support full duplex,
> if possible.

Are you aware of any flash chip supporting full-duplex operation?

>> should then operate in full-duplex mode, clocking out bytes from the dout
>> array while reading bytes into the din array in the same cycle. AFAIK no
>> driver in coreboot actually wants to use the controller this way, but it is
>> a legitimate way to operate a SPI bus and it's not impossible that we may
>> one day have a need for it. The Rockchip, Samsung and (I believe, those are
>> a little harder to read) Nvidia SoC drivers are implemented this way. There
>> is also the following comment (and associated implemented code) in
>> spi_flash.c:do_spi_flash_cmd() which makes me think that this understanding
>> was at least at some point the intended one:
>>
>>          /*
>>           * SPI flash requires command-response kind of behavior. Thus, two
>>           * separate SPI vectors are required -- first to transmit dout and
>> other
>>           * to receive in din. If some specialized SPI flash controllers
>>           * (e.g. x86) can perform both command and response together, it
>> should
>>           * be handled at SPI flash controller driver level.
>>           */
>>           <...array with two spi_op structs, one having only din and one
>> only dout...>
> cmd *then* response behavior. As you noted you'd have to eat the cmd
> bytes in the receive buffer. The issue is that we are dealing with 2
> different types of spi *flash* controllers:
> 1. generic spi controller that can do any normal spi thing
> 2. spi flash controllers on x86 systems. The ones on x86 systems are
> not a normal spi controller, and they can't really be treated like
> one.

Speaking with my flashrom hat on, it's a bit more complicated.

Duplex perspective:
There are two main types (and a few subtypes) of SPI controllers used
for flash:
1.) Full-duplex controllers: Capable of sending and receiving at the
same time
1.a) For each clock cycle, a bit is sent and a bit is received at the
same time (e.g. Raspberry Pi SPI)
1.b) For the first 8 clock cycles, a bit is sent. After that, for each
clock cycle, a bit is sent and a bit is received at the same time (e.g.
AMD SB600 and successors)
1.c) After a defined number of clock cycles, a constant number of bits
is ignored in both directions (e.g. 8 bits ignored for FAST READ
command, supported by some controllers)
2.) Half-duplex controllers: Capable of either sending or receiving at
any given time
2.a) Send-then-receive: A number of bits (>0) is sent, and after that a
number of bits (>=0) is received (e.g. Intel SPI)
2.b) After a defined number of clock cycles, a constant number of bits
is ignored in both directions (e.g. 8 bits ignored for FAST READ
command, supported by some controllers)

Programming perspective:
1. Single-transaction programming: Send+receive count as well as send
buffer and optionally receive buffer supplied in a single call, CS (chip
select) happens automatically at the beginning and end of the transaction
2. Separate CS programming, single-transaction buffer handling: Activate
CS, send+receive count as well as send buffer and optionally receive
buffer supplied in a single call, deactivate CS
3. Separate CS programming, two-transaction buffer handling: Activate
CS, send count+buffer in one call, optional receive count+buffer in one
call, deactivate CS
4. Separate CS programming, multi-transaction buffer handling: Activate
CS, send count+buffer in multiple calls, optional receive count+buffer
in multiple calls, deactivate CS

> The spi flash drivers previously made assumptions around #2 in that
> they would/could send din/dout in a full duplex manner assuming #2
> semantics -- basically very not normal spi.

What you describe sounds like single-transaction half-duplex.

>> However, most/all(?) Intel drivers and the Mediatek one seem to instead
>> implement a "one after the other" behavior: the controller first clocks out
> x86 spi *flash* controllers do that for sure, but that's to handle
> true spi flash operations that are essentially two transactions.
> mediatek does that for reasons I'm not aware of.

x86 AMD SB600 (and later) SPI flash controllers perform the first byte
half-duplex and the rest full-duplex.

SPI flash operations are single transactions on the bus (CS stays active
between the sending and receiving). Quite a few APIs seem to struggle
with that, see the Linux SPI API for example.

>> the dout array (discarding any bits received on MISO during this time), and
>> only once that is done and bytesout clock cycles have passed does it start
>> reading bytes into the din array for another bytesin cycles (clocking out
>> zeroes or some arbitrary value on MOSI during that time). There's also the
>> global spi_xfer_two_vectors() helper used by most of the above-mentioned
>> controllers (as their xfer_vector() callback) which combines two struct
>> spi_op vectors into a single xfer() call iff the first one only has a dout
>> and the second one only a din.
> That's just to combine the two ops spi flash cmds follow. Those
> helpers are there only for that purpose. If you look at the spi flash
> API buffers are passed in for getting data back, etc. There's no where
> to eat the setup cmd pieces w/o sending two ops down to the controller
> aside from realloc'ing a buffer with the larger size then memcpy()'ing
> back into the original buffer. Otherwise you are putting the burden on
> the higher layer callers to somehow know they need to pad their
> buffers. Also, the x86 spi flash controllers need all the info at once
> as a fused op.
>
>> Clearly, we shouldn't just let every controller decide on its own how to
>> interpret a transfer command, because they mean two very different things.
> One could never connect a generic spi device to an x86 spi *flash*
> controller.  That's similarly why is the special function
> flash_probe() in spi_ctrlr. It's to bypass the spi driver probing
> because it's not possible to probe the way the drivers are written.

Are you referring to the painful limitations of some flash controllers
which do not support reading or writing an arbitrary number of bytes
between zero (receiving) or one (sending) and the controller limit?

>> An SPI ROM read transaction packed into a single xfer() call (command in
>> dout, data read into din) would work fine with the second interpretation
>> but return garbage (first 4 bytes dropped and everything else shifted) with
>> the first interpretation.

The other way round: It would return 4 bytes of garbage, then everything
but the last 4 bytes shifted, and the last 4 bytes woud be dropped.

>>  A transaction from a full duplex driver that
>> really needs to send and receive at the same time wouldn't be possible with
>> the second interpretation at all.
> See my note about no way of being able to do that w/o allocating
> buffers to eat unneeded bytes. One could introduce offset into spi_op
> but that just makes things more complicated for the controller
> implementation. It's easier to just chunk the cmd-resp pair into a two
> requests.

If you want to handle the FAST READ flash command, you need to drop a
number of clock cycles (usually 8 clocks for 1 byte in the single-IO case).

>> I think we need to make a decision how we want the API to work, clearly
>> document that to remove future confusion, and then we can maybe simplify
> It handles all the scenarios. Implement xfer() and things will work fine.
>
>> some parts a little. If we decide that full-duplex is correct then I think
>> that spi_xfer_two_vectors() function needs to go because it's just clearly
>> doing the wrong thing in that case. If some controllers require to do the
> We should rename it and/or make it clearer what the intended usage is.
> It's a helper for spi *flash* controllers that can't really deal with
> real spi semantics.
>
>> out and in of a SPI flash operation in one go, they'll have to implement
>> xfer_vector() themselves and not just pipe it through to xfer(). If a
>> controller doesn't support full-duplex transfers, xfer() should give a
>> clear error if both din and dout are provided, not just do the wrong thing.
> spi_xfer_vector() already exists and works as one would expect.
>
>> If we decide that "first out, then in" is the intended meaning, the
>> remaining controllers that implement full-duplex should be changed to work
>> that way. After that we could simplify the SPI flash code because it can
>> just do a read operation with a single xfer(). We could probably get rid of
>> the whole xfer_vector() callback then because it seems that its only used
>> by the flash driver to coalesce command and response right now, and all
>> controllers that implement it just use the spi_xfer_two_vectors() helper.

Have you tried looking at how flashrom does it?

> I don't think all generic spi controllers use that at all. Only the
> ones that are hardware limited for spi flash or the driver is limited
> by implementation.
>
>> But if we do it this way, we'd never be able to support full-duplex drivers
>> if we ever needed them.

If we ever need full-duplex drivers for flash, the flash command set
will have changed.
If you want to use SPI as high-speed bidirectional serial debugging
line, full-duplex may be useful in some cases.
If you want to use SPI for attaching other peripherals, full-duplex may
be a requirement (but probably not in coreboot unless those peripherals
have to be initialized).

Regards,
Carl-Daniel