On 08/06/15 14:35, Stefan Hajnoczi wrote:
On Thu, Aug 6, 2015 at 12:02 PM, Marc Marí markmb@redhat.com wrote:
- if (qemu_cfg_dma_enabled()) {
QemuCfgDmaAccess access;
access.address = (u64)(u32)buf;
access.length = len;
access.control = QEMU_CFG_DMA_CTL_READ;
/*
* The out is done before the write of the variables on memory. This
* causes misread on the QEMU side.
*/
barrier();
outl((u32)&access, PORT_QEMU_CFG_DMA_ADDR);
I thought PORT_QEMU_CFG_DMA_ADDR is a 64-bit register according to the spec you posted?
while(access.length != 0 && !(access.control & QEMU_CFG_DMA_CTL_ERROR));
Either the field accesses need to be marked volatile, or a barrier is needed to force the compiler to reload these register from memory each iteration of the loop.
Better yet, I would recommend declaring (and here, also defining) "access" as "volatile".
volatile QemuCfgDmaAccess access;
I think the problem is that there are no "sequence points" (according to the C language specification) in this loop, so the compiler may assume that access.length and access.control will not change. https://en.wikipedia.org/wiki/Sequence_point
The loop does contain sequence points; three in total.
Appendix C of the ISO C spec gives an informative summary on the sequence points (with references to the individual normative passages). Quoting from appendix C:
1 The following are the sequence points described in 5.1.2.3:
[...]
- The end of the first operand of the following operators: logical AND && (6.5.13); [...]
[...]
- The end of a full expression: [...] the expression in an expression statement (6.8.3); [...] the controlling expression of a while or do statement (6.8.5); [...]
- Here we have the controlling expression of a while statement. - In that expression, we also have a logical AND operator. - And the null statement in the loop body is an expression statement.
The lack of sequence points is not the problem (we do have sequence points). The problem is that without "volatile", the compiler might optimize out the accesses, because it deduces that the expressions produce no side effects.
5.1.2.3 Program execution
[...]
2 Accessing a volatile object, [...] are all side effects, which are changes in the state of the execution environment. Evaluation of an expression may produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place. [...]
3 In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
[...]
5 The least requirements on a conforming implementation are:
- At sequence points, volatile objects are stable in the sense that previous accesses are complete and subsequent accesses have not yet occurred. [...]
[...]
6.7.3 Type qualifiers
[...]
6 An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3. Furthermore, at every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine, except as modified by the unknown factors mentioned previously. [footnote 116] What constitutes an access to an object that has volatile-qualified type is implementation- defined.
Footnote 116 (informative):
A volatile declaration may be used to describe an object corresponding to a memory-mapped input/output port or an object accessed by an asynchronously interrupting function. Actions on objects so declared shall not be "optimized out" by an implementation or reordered except as permitted by the rules for evaluating expressions.
So, this covers both the initial population of "access", and the later reads.
(If "access" is declared "volatile", then even the current barrier() can be removed, which is otherwise
src/types.h:#define barrier() __asm__ __volatile__("": : :"memory")
ie. a compiler barrier.)
Thanks Laszlo
On Thu, Aug 06, 2015 at 06:15:57PM +0200, Laszlo Ersek wrote:
On 08/06/15 14:35, Stefan Hajnoczi wrote:
On Thu, Aug 6, 2015 at 12:02 PM, Marc Marí markmb@redhat.com wrote:
- if (qemu_cfg_dma_enabled()) {
QemuCfgDmaAccess access;
access.address = (u64)(u32)buf;
access.length = len;
access.control = QEMU_CFG_DMA_CTL_READ;
/*
* The out is done before the write of the variables on memory. This
* causes misread on the QEMU side.
*/
barrier();
outl((u32)&access, PORT_QEMU_CFG_DMA_ADDR);
I thought PORT_QEMU_CFG_DMA_ADDR is a 64-bit register according to the spec you posted?
while(access.length != 0 && !(access.control & QEMU_CFG_DMA_CTL_ERROR));
Either the field accesses need to be marked volatile, or a barrier is needed to force the compiler to reload these register from memory each iteration of the loop.
Better yet, I would recommend declaring (and here, also defining) "access" as "volatile".
volatile QemuCfgDmaAccess access;
I'd prefer to avoid volatile for the reasons described at:
https://www.kernel.org/doc/Documentation/volatile-considered-harmful.txt
I think barrier() and yield() are what we want - in particular, we really do want to yield() while busy so that interrupts and other code can continue to run.
-Kevin
On 08/06/15 20:59, Kevin O'Connor wrote:
On Thu, Aug 06, 2015 at 06:15:57PM +0200, Laszlo Ersek wrote:
On 08/06/15 14:35, Stefan Hajnoczi wrote:
On Thu, Aug 6, 2015 at 12:02 PM, Marc Marí markmb@redhat.com wrote:
- if (qemu_cfg_dma_enabled()) {
QemuCfgDmaAccess access;
access.address = (u64)(u32)buf;
access.length = len;
access.control = QEMU_CFG_DMA_CTL_READ;
/*
* The out is done before the write of the variables on memory. This
* causes misread on the QEMU side.
*/
barrier();
outl((u32)&access, PORT_QEMU_CFG_DMA_ADDR);
I thought PORT_QEMU_CFG_DMA_ADDR is a 64-bit register according to the spec you posted?
while(access.length != 0 && !(access.control & QEMU_CFG_DMA_CTL_ERROR));
Either the field accesses need to be marked volatile, or a barrier is needed to force the compiler to reload these register from memory each iteration of the loop.
Better yet, I would recommend declaring (and here, also defining) "access" as "volatile".
volatile QemuCfgDmaAccess access;
I'd prefer to avoid volatile for the reasons described at:
https://www.kernel.org/doc/Documentation/volatile-considered-harmful.txt
Even that document concedes that "volatile" is appropriate for memory mapped IO.
Personally, the way I distinguish "volatile" from atomics and SMP barriers is this: in his presentations entitled "atomic<> Weapons":
https://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-a... https://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-a...
Herb Sutter said (I'm paraphrasing from memory): "use volatile for memory mapped IO, that is, for communication between a CPU and a device. Use atomics, memory barriers, locking etc. for synchronization between CPUs, eg. with multi-threaded code."
I have found that a very concise and helpful pair of sentences.
In this instance, we're definitely doing IO, so volatile is appropriate.
Clearly though, I defer to your judgement re SeaBIOS code :)
I think barrier() and yield() are what we want - in particular, we really do want to yield() while busy so that interrupts and other code can continue to run.
If you prefer barrier() over volatile, then, I believe:
- barrier() is sufficient between the filling of the struct and the outl().
- the read loop should be probably re-coded as:
do { barrier(); } while (access.length != 0 && !(access.control & QEMU_CFG_DMA_CTL_ERROR));
Re yield(), I guess that could be necessary in SeaBIOS, yes. (In edk2, there is only one interrupt, the timer interrupt, and it is serviced (= timer callbacks are dispatched) by default unless the "task priority level" is raised enough.)
Thanks Laszlo
-Kevin
SeaBIOS mailing list SeaBIOS@seabios.org http://www.seabios.org/mailman/listinfo/seabios
On Thu, Aug 06, 2015 at 09:34:00PM +0200, Laszlo Ersek wrote:
On 08/06/15 20:59, Kevin O'Connor wrote:
On Thu, Aug 06, 2015 at 06:15:57PM +0200, Laszlo Ersek wrote:
Better yet, I would recommend declaring (and here, also defining) "access" as "volatile".
volatile QemuCfgDmaAccess access;
I'd prefer to avoid volatile for the reasons described at:
https://www.kernel.org/doc/Documentation/volatile-considered-harmful.txt
Even that document concedes that "volatile" is appropriate for memory mapped IO.
Personally, the way I distinguish "volatile" from atomics and SMP barriers is this: in his presentations entitled "atomic<> Weapons":
https://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-a... https://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-a...
Thanks - I'll add the above to my queue.
Herb Sutter said (I'm paraphrasing from memory): "use volatile for memory mapped IO, that is, for communication between a CPU and a device. Use atomics, memory barriers, locking etc. for synchronization between CPUs, eg. with multi-threaded code."
I have found that a very concise and helpful pair of sentences.
In this instance, we're definitely doing IO, so volatile is appropriate.
In this code, the QemuCfgDmaAccess is not memory mapped IO. It's a regular struct that lives in regular memory. The "device" in this case is reading the struct from memory, performing some action, and then writing back to the struct in memory. So, to me this is more similar to multi-threaded code (ie, the "device" could be seen as similar to another CPU).
Of course, the action isn't asynchronous in the current proposed QEMU implementation, but I understand that making the interface future proof in that regard was the goal.
Clearly though, I defer to your judgement re SeaBIOS code :)
I think barrier() and yield() are what we want - in particular, we really do want to yield() while busy so that interrupts and other code can continue to run.
If you prefer barrier() over volatile, then, I believe:
barrier() is sufficient between the filling of the struct and the outl().
the read loop should be probably re-coded as:
do { barrier(); } while (access.length != 0 && !(access.control & QEMU_CFG_DMA_CTL_ERROR));
I'd probably do something like:
barrier(); outl((u32)&access, PORT_QEMU_CFG_DMA_ADDR); while (access.control & QEMU_CFG_DMA_CTL_READ) yield();
Re yield(), I guess that could be necessary in SeaBIOS, yes. (In edk2, there is only one interrupt, the timer interrupt, and it is serviced (= timer callbacks are dispatched) by default unless the "task priority level" is raised enough.)
Yeah, SeaBIOS is a little more complicated wrt interrupts - there is a brief blurb describing threads and interrupts at: http://www.seabios.org/Execution_and_code_flow
Cheers, -Kevin
On Thu, Aug 6, 2015 at 5:15 PM, Laszlo Ersek lersek@redhat.com wrote:
On 08/06/15 14:35, Stefan Hajnoczi wrote:
On Thu, Aug 6, 2015 at 12:02 PM, Marc Marí markmb@redhat.com wrote:
- if (qemu_cfg_dma_enabled()) {
QemuCfgDmaAccess access;
access.address = (u64)(u32)buf;
access.length = len;
access.control = QEMU_CFG_DMA_CTL_READ;
/*
* The out is done before the write of the variables on memory. This
* causes misread on the QEMU side.
*/
barrier();
outl((u32)&access, PORT_QEMU_CFG_DMA_ADDR);
I thought PORT_QEMU_CFG_DMA_ADDR is a 64-bit register according to the spec you posted?
while(access.length != 0 && !(access.control & QEMU_CFG_DMA_CTL_ERROR));
Either the field accesses need to be marked volatile, or a barrier is needed to force the compiler to reload these register from memory each iteration of the loop.
Better yet, I would recommend declaring (and here, also defining) "access" as "volatile".
volatile QemuCfgDmaAccess access;
I think the problem is that there are no "sequence points" (according to the C language specification) in this loop, so the compiler may assume that access.length and access.control will not change. https://en.wikipedia.org/wiki/Sequence_point
The loop does contain sequence points; three in total.
Appendix C of the ISO C spec gives an informative summary on the sequence points (with references to the individual normative passages). Quoting from appendix C:
1 The following are the sequence points described in 5.1.2.3:
[...] - The end of the first operand of the following operators: logical AND && (6.5.13); [...] [...] - The end of a full expression: [...] the expression in an expression statement (6.8.3); [...] the controlling expression of a while or do statement (6.8.5); [...]
- Here we have the controlling expression of a while statement.
- In that expression, we also have a logical AND operator.
- And the null statement in the loop body is an expression statement.
The lack of sequence points is not the problem (we do have sequence points). The problem is that without "volatile", the compiler might optimize out the accesses, because it deduces that the expressions produce no side effects.
5.1.2.3 Program execution
[...]
2 Accessing a volatile object, [...] are all side effects, which are changes in the state of the execution environment. Evaluation of an expression may produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place. [...]
3 In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
You are right. I misremembered what sequence points mean. Thanks for the quotes from the standard.
Stefan