Hi,
I rewrote the ASSERT() and BUG() implementations (src/include/assert.h) from scratch. The old one used different messages for preram and ram stages and would not print any warnings if CONFIG_DEBUG was disabled. In the case of CONFIG_DEBUG the code would die().
I wonder if that's the behavior we want. If something is bad enough to stop a system during development, maybe the warning should not be ignored completely when it happens in a productive system? I think we should always print a warning if the code is inconsistent.
Also, do we want to die() on an assert? I believe in most cases we don't.The worst case that happens when we run into a bug or assert situation is that we can not boot the system. But in some cases it's not that bad.. like in the mptable / acpi generator or some IOAPIC and SuperIO drivers. We might still be able to boot into a system and flash a new, fixed coreboot image, but in case of a die() we desperately brick the system. I think we should not do that.
What do you think?
Stefan
Stefan Reinauer wrote:
Also, do we want to die() on an assert?
Ideally I would like coreboot to be so structured that an assert just leads to the current "task" being aborted, and then we try to continue as best as possible.
It would be great to also have a panic room, but I agree that it should only be called fairly infrequently. Ie. I'd like to do away with die() in the long term.
//Peter
On 2/22/10 1:33 PM, Peter Stuge wrote:
Stefan Reinauer wrote:
Also, do we want to die() on an assert?
Ideally I would like coreboot to be so structured that an assert just leads to the current "task" being aborted, and then we try to continue as best as possible.
Yes. I agree. Intuitively I'd think that this is better written down in code at the place where it happens, though, instead of trying to do magic in order to detect which scope to drop out to. If no magic is required, I'm all for doing this. In Forth or C++ or Java we could do a try { ... } catch (...) kind of thing.
It would be great to also have a panic room, but I agree that it should only be called fairly infrequently. Ie. I'd like to do away with die() in the long term.
I still try to fancy what that panic room is. Is it the gdb handler for the case that gdb is enabled?
Stefan
On Mon, Feb 22, 2010 at 7:02 AM, Stefan Reinauer stepan@coresystems.de wrote:
I still try to fancy what that panic room is. Is it the gdb handler for the case that gdb is enabled?
in my ideal world, the panic room is SerialICE. I just had a talk with someone over the weekend (name not mentioned to protect the innocent -- me :-) and I think this ought to be workable.
a hlt() in case of die() just seems too final. How much more useful if it dropped back to serialice and one could talk to coreboot and find out, from the coreboot log, what just happened.
ron
ron minnich wrote:
I still try to fancy what that panic room is. Is it the gdb handler for the case that gdb is enabled?
in my ideal world, the panic room is SerialICE.
That's a nice idea.
I think interesting suggestions for panic room go from debugging (SerialICE, gdb) to monitor (console menu for file transfer and flashing) and beyond.
For board porters it's definately most useful to get debugging. For production use maybe monitor mode is more useful. Or even something like a RAM scrubber as a simple failsafe?
I guess we don't want to limit ourselves to just one choice here.
//Peter
On 2/22/10 5:59 PM, Peter Stuge wrote:
ron minnich wrote:
I still try to fancy what that panic room is. Is it the gdb handler for the case that gdb is enabled?
in my ideal world, the panic room is SerialICE.
That's a nice idea.
I think interesting suggestions for panic room go from debugging (SerialICE, gdb) to monitor (console menu for file transfer and flashing) and beyond.
For board porters it's definately most useful to get debugging. For production use maybe monitor mode is more useful. Or even something like a RAM scrubber as a simple failsafe?
I guess we don't want to limit ourselves to just one choice here.
Well, one choice is better than none and the world was not invented in a day ...
On 22.02.2010 18:49, Stefan Reinauer wrote:
On 2/22/10 5:59 PM, Peter Stuge wrote:
ron minnich wrote:
I still try to fancy what that panic room is. Is it the gdb handler for the case that gdb is enabled?
in my ideal world, the panic room is SerialICE.
That's a nice idea.
I think interesting suggestions for panic room go from debugging (SerialICE, gdb) to monitor (console menu for file transfer and flashing) and beyond.
For board porters it's definately most useful to get debugging. For production use maybe monitor mode is more useful. Or even something like a RAM scrubber as a simple failsafe?
I guess we don't want to limit ourselves to just one choice here.
Well, one choice is better than none and the world was not invented in a day ...
If SerialICE is relocatable to have all code in CAR and if it can do batched commands (<start_transaction>, <bunch_of_memory_writes>, <end_transaction_and_execute_now>), it is reasonably easy to have flashrom rewrite the ROM over the SerialICE interface. IMHO that's a pretty good way to recover from a non-booting machine because it only needs a serial line (or whatever you're using to communicate with SerialICE).
Regards, Carl-Daniel
On Mon, 22 Feb 2010 12:42:48 +0100, Stefan Reinauer stepan@coresystems.de wrote:
Hi,
I rewrote the ASSERT() and BUG() implementations (src/include/assert.h) from scratch. The old one used different messages for preram and ram stages and would not print any warnings if CONFIG_DEBUG was disabled. In the case of CONFIG_DEBUG the code would die().
I wonder if that's the behavior we want. If something is bad enough to stop a system during development, maybe the warning should not be ignored completely when it happens in a productive system? I think we should always print a warning if the code is inconsistent.
Also, do we want to die() on an assert? I believe in most cases we don't.The worst case that happens when we run into a bug or assert situation is that we can not boot the system. But in some cases it's not that bad.. like in the mptable / acpi generator or some IOAPIC and SuperIO drivers. We might still be able to boot into a system and flash a new, fixed coreboot image, but in case of a die() we desperately brick the system. I think we should not do that.
What do you think?
I use die in raminit for memory compatibility checks. If the memory is not compatible, there is no use moving on...so we die().
On 2/22/10 1:56 PM, Joseph Smith wrote:
I use die in raminit for memory compatibility checks. If the memory is not compatible, there is no use moving on...so we die().
I agree... to some extent...
The file I posted does not change the explicit die() calls though but only the implicit die() in the two macros ASSERT and BUG.
As for memory compatibility, I saw the same thing on i945 - it does not support ECC RAM and some closed source BIOSes just die() in the case of ECC memory detected. However, it is possible to use the memory in non-ECC mode, so not using die() might still be worth thinking about in such a case. Worst case ram init does not work and then coreboot die()s automatically when it tries jumping to stage 2 in RAM. It should, however, always print a big fat BIOS_ERR warning in that case.
Stefan
On Mon, 22 Feb 2010 15:54:08 +0100, Stefan Reinauer stepan@coresystems.de wrote:
On 2/22/10 1:56 PM, Joseph Smith wrote:
I use die in raminit for memory compatibility checks. If the memory is
not
compatible, there is no use moving on...so we die().
I agree... to some extent...
The file I posted does not change the explicit die() calls though but only the implicit die() in the two macros ASSERT and BUG.
Oh ok.
As for memory compatibility, I saw the same thing on i945 - it does not support ECC RAM and some closed source BIOSes just die() in the case of ECC memory detected. However, it is possible to use the memory in non-ECC mode, so not using die() might still be worth thinking about in such a case. Worst case ram init does not work and then coreboot die()s automatically when it tries jumping to stage 2 in RAM. It should, however, always print a big fat BIOS_ERR warning in that case.
Yes, whether you die() or try to get past you should always print_err() a big fat message and then if it needs to die("HALT\r\n")