On Tue, Feb 14, 2012 at 7:08 PM, Ian Campbell Ian.Campbell@citrix.com wrote:
On Tue, 2012-02-14 at 00:33 +0000, Kevin O'Connor wrote:
On Mon, Feb 13, 2012 at 08:50:56PM +0000, Ian Campbell wrote:
On Mon, 2012-02-13 at 23:21 +0900, Daniel Castro wrote:
Hello,
I have encountered something a little strange, if I set up the debug lvl to 3 or more Y will get a Triple VCPU fault. If I set it to 1 the bios runs normally but I loose a lot of information that I need to debug. Sometimes if I try to print char * variables regardless of the debug level I still get the fault.
Any ideas why?
My guess is that there is a debug print at lvl>=3 which ends up dereferencing a NULL pointer in one of its arguments (probably a %s) and this leads to a page fault. This in turn leads to a double fault because SeaBIOS does not install a page fault handler and then a triple fault because it also does not install a double fault handler. Likewise when you are printing "char * variables regardless of the debug level".
SeaBIOS doesn't have paging enabled, so it should not need to install a page fault handler.
Doh, yes you are obviously right!
In my defence when running virtualised paging may actually be enabled contrary to what the guest thinks is going on (I think this is needed in order to run real-mode code on EPT with a 1-1 map).
Really the hypervisor should completely hide this from the guest. I'm not actually sure what Xen does but it may well take the easy way out and rely on the BIOS not faulting... It still ought to print at least the faulting address and IP on triple fault though. It may be useful for Daniel to patch xen/arch/x86/hvm/hvm.c:hvm_triple_fault to add this information.
SeaBIOS needs to write the real-mode interrupt descriptor table to address 0, so it should definitely have read/write access to the memory there. Thus, a null pointer dereference shouldn't cause a fault. Indeed, I can't think of much that should cause a fault (other than read/write to IO memory incorrectly, divide by zero, invalid opcode, etc.).
An invalid pointer other than NULL might also do it, e.g. I think Xen scrubs memory (in a debug build) to something like 0xcc.
In that case a NULL check won't work but I suppose one could use a patch which treats %s as %p for the purposes of debugging it...
You could test this by adding an explicit check for null in the bit of bvprintf which handles %s, perhaps putc()ing "(null)" instead.
If you think it is specific to the Xen handling, one could also try running the same code on qemu to verify it.
Also trying the underlying SeaBIOS version without any local patches would be a good idea if you haven't already.
Well I suspected some limitation on the stack or something like that, so I decided to divide the code in a succession of function calls, for example: int share_vbd(char* device); int share_vbd2(char * device, char * state); int share_vbd3((char * device, char * state, char *back_end_path); etc...
Anyway now the fault is not present, it is the same code just that I called it in a succesion of function... So my best guess is that I was over running the stack.
Thank you all for the suggestion, I will implement Ian's suggestions.
Daniel
Ian.