On 2009/11/17 at 10:37, Tarl Neustaedter Tarl.Neustaedter@Sun.COM wrote:
[...]
Based on that output, it's hard for me to tell what about "open-package'"
failed
Yup. What I spot:
: mount-root ( Empty ) 00000000ffe35d28: boot-dev$ ( ffe357a8 6 ) 00000000ffe35d30: fs-pkg$ ( ffe357a8 6 ffe317c0 10 ) 00000000ffe35d38: $open-package : $open-package ( ffe357a8 6 ffe317c0 10 )
It looks like fs-pkg$ names the package it's opening. I'm guessing that's hsfs-file-system, but you should probably verify that's what is in that string (do a "2dup type" before calling $open-package). If that's correct, then rather than descending into open-package, you probably want to put a breakpoint at "open" in /packages/hsfs-file-system, which is where that will end up after much wandering around. Then proceed from there - you really don't want to trace through instance creation.
Okay, I was able to do a "2dup type" and verify that hsfs-file-system is the name of the file that it's trying to open.
Again, forgive my ignorance (being new to this whole development process), but what's the best way to insert a breakpoint where you've suggested?
Thanks! -Nick
The reason you weren't able to see all that (which would have been overwhelming) is a bit further down:
00000000ffe135f0: (lit) ( 1 ffe2fe38 ffe4b1d8 ffe4b548 ffe135e0 4 ffe4b558 ffe13310 ) 00000000ffe13600: catch ( 1 ffe2fe38 ffe4b1d8 ffe4b548 ffffffffffffffff 0 )
That's basically a subroutine call to ffe13310, which it didn't show.
-------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
Nick Couchman wrote:
[...] Okay, I was able to do a "2dup type" and verify that hsfs-file-system is the name of the file that it's trying to open.
Again, forgive my ignorance (being new to this whole development process), but what's the best way to insert a breakpoint where you've suggested?
At a guess,
ok dev /packages/hsfs-file-system ok debug open ok dev
Tarl Neustaedter wrote:
At a guess,
ok dev /packages/hsfs-file-system ok debug open ok dev
That's correct, i.e. you need to change to package you wish to debug and then invoke "debug <foo>" on the word you wish to step through.
Note that one thing I have found with the debugger is that you may need to manually add breakpoints for methods called during package opening using the above method, rather than being able to use "U" and "D" interactively.
AFAICT this is because the debugger needs to locate the start and end of a word in the wordlist, and if the package open fails then the new wordlist isn't set, and hence the word can't be located and added to the debug word list.
HTH,
Mark.
On 2009/11/18 at 10:37, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk
wrote:
Tarl Neustaedter wrote:
At a guess,
ok dev /packages/hsfs-file-system ok debug open ok dev
That's correct, i.e. you need to change to package you wish to debug and then invoke "debug <foo>" on the word you wish to step through.
Note that one thing I have found with the debugger is that you may need to manually add breakpoints for methods called during package opening using the above method, rather than being able to use "U" and "D" interactively.
AFAICT this is because the debugger needs to locate the start and end of a word in the wordlist, and if the package open fails then the new wordlist isn't set, and hence the word can't be located and added to the debug word list.
(I apologize in advance - the output is kind of long, but I wanted to make sure I posted everything relevant...)
Here's the results of the effort to debug "open":
0 > boot [sparc64] Booting file 'cdrom' with parameters '' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7120 bytes entry point is 0x4000 Evaluating FCode... read-file isn't unique.
: open ( ffffffffffffffff 1 0 ffffffffffffffff 0 0 0 ffe0b7c0 0 0 0 0 0 ffe3a120 ) 00000000ffe33c78: my-args ( ffffffffffffffff 1 0 ffffffffffffffff 0 0 0 ffe0b7c0 0 0 0 0 0 ffe3a120 ffe493c0 7 ) 00000000ffe33c80: dev-open ( ffffffffffffffff 1 0 ffffffffffffffff 0 0 0 ffe0b7c0 0 0 0 0 0 ffe3a120 ffe4c230 ) 00000000ffe33c88: dup ( ffffffffffffffff 1 0 ffffffffffffffff 0 0 0 ffe0b7c0 0 0 0 0 0 ffe3a120 ffe4c230 ffe4c230 ) 00000000ffe33c90: 0= ( ffffffffffffffff 1 0 ffffffffffffffff 0 0 0 ffe0b7c0 0 0 0 0 0 ffe3a120 ffe4c230 0 ) 00000000ffe33c98: do?branch ( ffffffffffffffff 1 0 ffffffffffffffff 0 0 0 ffe0b7c0 0 0 0 0 0 ffe3a120 ffe4c230 ) 00000000ffe33cb0: (lit) ( ffffffffffffffff 1 0 ffffffffffffffff 0 0 0 ffe0b7c0 0 0 0 0 0 ffe3a120 ffe4c230 ffe314e0 ) 00000000ffe33cc0: (to) ( ffffffffffffffff 1 0 ffffffffffffffff 0 0 0 ffe0b7c0 0 0 0 0 0 ffe3a120 ) 00000000ffe33cc8: initialize seek failed
Can't mount root
byte-load: exception caught! ok
So, open is failing during "initialize" - if I debug that:
0 > do-boot : open ( Empty ) 00000000ffe33c78: my-args ( ffe493c0 7 ) 00000000ffe33c80: dev-open ( ffe4c7e8 ) 00000000ffe33c88: dup ( ffe4c7e8 ffe4c7e8 ) 00000000ffe33c90: 0= ( ffe4c7e8 0 ) 00000000ffe33c98: do?branch ( ffe4c7e8 ) 00000000ffe33cb0: (lit) ( ffe4c7e8 ffe314e0 ) 00000000ffe33cc0: (to) ( Empty ) 00000000ffe33cc8: initialize : initialize ( Empty ) 00000000ffe33448: /sector ( 800 ) 00000000ffe33450: mem-alloc ( 8004000 ) 00000000ffe33458: (lit) ( 8004000 ffe31508 ) 00000000ffe33468: (to) ( Empty ) 00000000ffe33470: get-vol-desc seek failed
Can't mount root Aborted.
get-vol-desc now appears to be the culprit:
0 > do-boot : open ( Empty ) 00000000ffe33c78: my-args ( ffe493c0 7 ) 00000000ffe33c80: dev-open ( ffe4d358 ) 00000000ffe33c88: dup ( ffe4d358 ffe4d358 ) 00000000ffe33c90: 0= ( ffe4d358 0 ) 00000000ffe33c98: do?branch ( ffe4d358 ) 00000000ffe33cb0: (lit) ( ffe4d358 ffe314e0 ) 00000000ffe33cc0: (to) ( Empty ) 00000000ffe33cc8: initialize : initialize ( Empty ) 00000000ffe33448: /sector ( 800 ) 00000000ffe33450: mem-alloc ( 8008000 ) 00000000ffe33458: (lit) ( 8008000 ffe31508 ) 00000000ffe33468: (to) ( Empty ) 00000000ffe33470: get-vol-desc : get-vol-desc ( Empty ) 00000000ffe318b8: vol-desc ( 8008000 ) 00000000ffe318c0: /sector ( 8008000 800 ) 00000000ffe318c8: vol-desc-sector# ( 8008000 800 10 ) 00000000ffe318d0: /sector ( 8008000 800 10 800 ) 00000000ffe318d8: * ( 8008000 800 8000 ) 00000000ffe318e0: dev-ih ( 8008000 800 8000 ffe4d358 ) 00000000ffe318e8: read-disk seek failed
Can't mount root Aborted.
read-disk:
0 > do-boot : open ( Empty ) 00000000ffe33c78: my-args ( ffe493c0 7 ) 00000000ffe33c80: dev-open ( ffe4d910 ) 00000000ffe33c88: dup ( ffe4d910 ffe4d910 ) 00000000ffe33c90: 0= ( ffe4d910 0 ) 00000000ffe33c98: do?branch ( ffe4d910 ) 00000000ffe33cb0: (lit) ( ffe4d910 ffe314e0 ) 00000000ffe33cc0: (to) ( Empty ) 00000000ffe33cc8: initialize : initialize ( Empty ) 00000000ffe33448: /sector ( 800 ) 00000000ffe33450: mem-alloc ( 800a000 ) 00000000ffe33458: (lit) ( 800a000 ffe31508 ) 00000000ffe33468: (to) ( Empty ) 00000000ffe33470: get-vol-desc : get-vol-desc ( Empty ) 00000000ffe318b8: vol-desc ( 800a000 ) 00000000ffe318c0: /sector ( 800a000 800 ) 00000000ffe318c8: vol-desc-sector# ( 800a000 800 10 ) 00000000ffe318d0: /sector ( 800a000 800 10 800 ) 00000000ffe318d8: * ( 800a000 800 8000 ) 00000000ffe318e0: dev-ih ( 800a000 800 8000 ffe4d910 ) 00000000ffe318e8: read-disk : read-disk seek failed
Can't mount root Aborted.
So, for some reason, it cannot step into read-disk for debugging. If I do "see read-disk":
0 > see read-disk : read-disk dup >r 0 swap cif-seek if " seek failed" die tuck swap r> cif-read <> if " read failed" die ; ok
And, all of these seem to be primitive words - e.g. swap, cif-seek, etc., cannot be debugged. I would guess "cif-seek" is where it's failing, which looks like this:
0 > see cif-seek defer cif-seek is seek
ok
And I can't seem to go any deeper than that into debugging. I'm guessing at this point maybe this is where it gets to be a Qemu issue - that there's something about the seek command in the hardware emulation itself that's failing?
Thanks, again, everyone, for all the help!
-Nick
HTH,
Mark.
-------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
Nick Couchman wrote:
[...] So, for some reason, it cannot step into read-disk for debugging. If I do "see read-disk":
0 > see read-disk : read-disk dup >r 0 swap cif-seek if " seek failed" die tuck swap r> cif-read <> if " read failed" die ; ok
And, all of these seem to be primitive words - e.g. swap, cif-seek, etc., cannot be debugged. I would guess "cif-seek" is where it's failing, which looks like this:
These are part of the forth kernel, where it does file handling operations - they eventually end up generating reads to the HBA.
Where you are in the code itself - line 209 of:
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/psm/stand/boot...
So we're doing the first cif-seek, which is the client-interface seek. See section 6.3.2.3 in IEEE 1275. That causes the disk to be read so you can cache disk accesses. The question is whether the arguments we are sending are reasonable. What we saw:
00000000ffe318e0: dev-ih ( 8008000 800 8000 ffe4d358 ) 00000000ffe318e8: read-disk
Those arguments we see on the stack are adr, len, off, ihandle. ffe4d358 is a reasonable address for an ihandle. 0x8000 is a reasonable byte offset into the disk to read - and that's all that seek takes (the adr, len get used later by the cif-read, which we don't reach). So somehow, trying to byte offset 0x8000 on the disk is failing - don't know why.
On 2009/11/18 at 11:24, Tarl Neustaedter Tarl.Neustaedter@sun.com wrote:
Nick Couchman wrote:
[...] So, for some reason, it cannot step into read-disk for debugging. If I do
"see read-disk":
0 > see read-disk : read-disk dup >r 0 swap cif-seek if " seek failed" die tuck swap r> cif-read <> if " read failed" die ; ok
And, all of these seem to be primitive words - e.g. swap, cif-seek, etc.,
cannot be debugged. I would guess "cif-seek" is where it's failing, which looks like this:
These are part of the forth kernel, where it does file handling operations - they eventually end up generating reads to the HBA.
Where you are in the code itself - line 209 of:
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/psm/stand/boot... lks/common/util.fth
So we're doing the first cif-seek, which is the client-interface seek. See section 6.3.2.3 in IEEE 1275. That causes the disk to be read so you can cache disk accesses. The question is whether the arguments we are sending are reasonable. What we saw:
00000000ffe318e0: dev-ih ( 8008000 800 8000 ffe4d358 ) 00000000ffe318e8: read-disk
Those arguments we see on the stack are adr, len, off, ihandle. ffe4d358 is a reasonable address for an ihandle. 0x8000 is a reasonable byte offset into the disk to read - and that's all that seek takes (the adr, len get used later by the cif-read, which we don't reach). So somehow, trying to byte offset 0x8000 on the disk is failing - don't know why.
Well, I thought maybe I could use a SCSI-based CD-ROM in Qemu and see if it was the HBA emulation code causing the problem, but qemu doesn't seem to want to boot with a SCSI controller:
OpenBIOS for Sparc64 Cannot manage 'SCSI bus controller' PCI device type '<NULL>': 1000 12 (1 0 0) Segmentation fault
-Nick
-------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
On Wed, Nov 18, 2009 at 9:39 PM, Nick Couchman Nick.Couchman@seakr.com wrote:
On 2009/11/18 at 11:24, Tarl Neustaedter Tarl.Neustaedter@sun.com wrote:
Nick Couchman wrote:
[...] So, for some reason, it cannot step into read-disk for debugging. If I do
"see read-disk":
0 > see read-disk : read-disk dup >r 0 swap cif-seek if " seek failed" die tuck swap r> cif-read <> if " read failed" die ; ok
And, all of these seem to be primitive words - e.g. swap, cif-seek, etc.,
cannot be debugged. I would guess "cif-seek" is where it's failing, which looks like this:
These are part of the forth kernel, where it does file handling operations - they eventually end up generating reads to the HBA.
Where you are in the code itself - line 209 of:
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/psm/stand/boot... lks/common/util.fth
So we're doing the first cif-seek, which is the client-interface seek. See section 6.3.2.3 in IEEE 1275. That causes the disk to be read so you can cache disk accesses. The question is whether the arguments we are sending are reasonable. What we saw:
00000000ffe318e0: dev-ih ( 8008000 800 8000 ffe4d358 ) 00000000ffe318e8: read-disk
Those arguments we see on the stack are adr, len, off, ihandle. ffe4d358 is a reasonable address for an ihandle. 0x8000 is a reasonable byte offset into the disk to read - and that's all that seek takes (the adr, len get used later by the cif-read, which we don't reach). So somehow, trying to byte offset 0x8000 on the disk is failing - don't know why.
Well, I thought maybe I could use a SCSI-based CD-ROM in Qemu and see if it was the HBA emulation code causing the problem, but qemu doesn't seem to want to boot with a SCSI controller:
OpenBIOS for Sparc64 Cannot manage 'SCSI bus controller' PCI device type '<NULL>': 1000 12 (1 0 0) Segmentation fault
It's OpenBIOS, there is no driver for the HBA. There's a driver for ESP (Sparc32), but I don't think there is a PCI card with ESP chipset.
Tarl Neustaedter wrote:
So we're doing the first cif-seek, which is the client-interface seek. See section 6.3.2.3 in IEEE 1275. That causes the disk to be read so you can cache disk accesses. The question is whether the arguments we are sending are reasonable. What we saw:
00000000ffe318e0: dev-ih ( 8008000 800 8000 ffe4d358 ) 00000000ffe318e8: read-disk
Those arguments we see on the stack are adr, len, off, ihandle. ffe4d358 is a reasonable address for an ihandle. 0x8000 is a reasonable byte offset into the disk to read - and that's all that seek takes (the adr, len get used later by the cif-read, which we don't reach). So somehow, trying to byte offset 0x8000 on the disk is failing - don't know why.
Here's what I get on my Milax CD image here:
: read-disk ( 8002000 800 8000 ffe4adc8 ) 00000000ffe31340: dup ( 8002000 800 8000 ffe4adc8 ffe4adc8 ) 00000000ffe31348: >r ( 8002000 800 8000 ffe4adc8 ) 00000000ffe31350: 0 ( 8002000 800 8000 ffe4adc8 0 ) 00000000ffe31358: swap ( 8002000 800 8000 0 ffe4adc8 ) 00000000ffe31360: cif-seek : seek ( 8002000 800 8000 0 ffe4adc8 ) 00000000ffe280d0: swap ( 8002000 800 8000 ffe4adc8 0 ) 00000000ffe280d8: rot ( 8002000 800 ffe4adc8 0 8000 ) 00000000ffe280e0: dup ( 8002000 800 ffe4adc8 0 8000 8000 ) 00000000ffe280e8: ihandle>phandle ( 8002000 800 ffe4adc8 0 8000 0 ) 00000000ffe280f0: (") ( 8002000 800 ffe4adc8 0 8000 0 ffe28100 4 ) 00000000ffe28108: rot ( 8002000 800 ffe4adc8 0 8000 ffe28100 4 0 ) 00000000ffe28110: find-method ( 8002000 800 ffe4adc8 0 8000 0 ) 00000000ffe28118: do?branch ( 8002000 800 ffe4adc8 0 8000 ) 00000000ffe28148: 3drop ( 8002000 800 ) 00000000ffe28150: -1 ( 8002000 800 ffffffffffffffff ) 00000000ffe28158: (semis) [ Finished seek ] ( 8002000 800 ffffffffffffffff ) 00000000ffe31368: do?branch ( 8002000 800 ) 00000000ffe31378: (") ( 8002000 800 ffe31388 b ) 00000000ffe31398: die seek failed
Can't mount root Aborted.
Hmmm it looks to me as if the client interface seek word is expecting the arguments in a different order - I would expect ihandle>phandle to be executed on ffe4adc8, not on 0.
HTH,
Mark.
Mark Cave-Ayland wrote:
Hmmm it looks to me as if the client interface seek word is expecting the arguments in a different order - I would expect ihandle>phandle to be executed on ffe4adc8, not on 0.
Referring to the IEEE-1275 spec, the arguments for seek are listed as:
IN: ihandle, pos.hi, pos.lo
whereas the comments in forth/system/ciface.fs say:
( ihandle pos_hi pos_lo -- status )
Has OpenBIOS misinterpreted the spec, in that arguments in the OF spec should read top of the stack to bottom of the stack from left to right, rather than the other way around? Then again, if this were the case, would other OpenBIOS-based client interfaces not have discovered this before?
ATB,
Mark.
Mark Cave-Ayland wrote:
Mark Cave-Ayland wrote:
Hmmm it looks to me as if the client interface seek word is expecting the arguments in a different order - I would expect ihandle>phandle to be executed on ffe4adc8, not on 0.
Referring to the IEEE-1275 spec, the arguments for seek are listed as:
IN: ihandle, pos.hi, pos.lo
whereas the comments in forth/system/ciface.fs say:
( ihandle pos_hi pos_lo -- status )
Has OpenBIOS misinterpreted the spec, in that arguments in the OF spec should read top of the stack to bottom of the stack from left to right, rather than the other way around? Then again, if this were the case, would other OpenBIOS-based client interfaces not have discovered this before?
That looks incorrect. The Client Interface part of the specification shows the arguments in backwards order (c order :-) compared to the rest of the specification. At the top of section 6.3.1, it states arguments are specified in the order arg1, ... argn. This is backwards from normal forth notation which is argn,... arg2, arg1. So it appears there is an error in forth/systems/cifaces.fs. The corresponding code in Sun's OBP (obp/os/bootprom/clientif.fth) shows:
cif: seek ( low,high ihandle -- status )
On 2009/11/18 at 14:44, Tarl Neustaedter Tarl.Neustaedter@Sun.COM wrote:
Mark Cave-Ayland wrote:
Mark Cave-Ayland wrote:
Hmmm it looks to me as if the client interface seek word is expecting the arguments in a different order - I would expect ihandle>phandle to be executed on ffe4adc8, not on 0.
Referring to the IEEE-1275 spec, the arguments for seek are listed as:
IN: ihandle, pos.hi, pos.lo
whereas the comments in forth/system/ciface.fs say:
( ihandle pos_hi pos_lo -- status )
Has OpenBIOS misinterpreted the spec, in that arguments in the OF spec should read top of the stack to bottom of the stack from left to right, rather than the other way around? Then again, if this were the case, would other OpenBIOS-based client interfaces not have discovered this before?
That looks incorrect. The Client Interface part of the specification shows the arguments in backwards order (c order :-) compared to the rest of the specification. At the top of section 6.3.1, it states arguments are specified in the order arg1, ... argn. This is backwards from normal forth notation which is argn,... arg2, arg1. So it appears there is an error in forth/systems/cifaces.fs. The corresponding code in Sun's OBP (obp/os/bootprom/clientif.fth) shows:
cif: seek ( low,high ihandle -- status )
So, is this a change that needs to be made in the OpenBIOS forth code, in the Qemu code, or somewhere else??
-Nick
-------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
Nick Couchman wrote:
[....] cif: seek ( low,high ihandle -- status )
So, is this a change that needs to be made in the OpenBIOS forth code, in the Qemu code, or somewhere else??
It looks like the forth code is simply wrong. The first few tokens in forth/system/ciface.fs:seek are clearly expecting arguments in the wrong order ("swap rot dup ihandle>phandle" expects ihandle in the 3rd argument rather than 1st). The question is what in OpenBios depends on that misbehaviour, and there I am very much out of my depth.
Tarl Neustaedter wrote:
It looks like the forth code is simply wrong. The first few tokens in forth/system/ciface.fs:seek are clearly expecting arguments in the wrong order ("swap rot dup ihandle>phandle" expects ihandle in the 3rd argument rather than 1st). The question is what in OpenBios depends on that misbehaviour, and there I am very much out of my depth.
Nothing _in_ OpenBIOS should be using client interface methods. If it did, we should fix that as we proceed.
With such a bug I wonder, however, how we could ever boot a client.
Attached patch should improve the behavior. It's untested though, possibly additional cleanup is needed after call-method...
reverse order of parameters for 6.3.2.3 Device I/O client interface functions.
Signed-off-by: Stefan Reinauer stepan@coresystems.de
Index: forth/system/ciface.fs =================================================================== --- forth/system/ciface.fs (revision 613) +++ forth/system/ciface.fs (working copy) @@ -212,21 +212,16 @@ close-dev ;
-: read ( ihandle addr len -- actual ) - rot dup ihandle>phandle " read" rot find-method - if swap call-package else 3drop -1 then +: read ( len addr ihandle -- actual ) + rot swap " read" call-method ;
-: write ( ihandle addr len -- actual ) - rot dup ihandle>phandle " write" rot find-method - if swap call-package else 3drop -1 then +: write ( len addr ihandle -- actual ) + rot swap " write" call-method ;
-: seek ( ihandle pos_hi pos_lo -- status ) - \ package methods uses ( pos_lo pos_hi -- status ) - swap - rot dup ihandle>phandle " seek" rot find-method - if swap call-package else 3drop -1 then +: seek ( pos_lo pos_hi ihandle -- status ) + " seek" call-method ;
@@ -261,7 +256,7 @@
: interpret ( xxx cmdstring -- ??? catch-reult ) dup cstrlen - \ ." INTERPRETE: --- " 2dup type + \ ." INTERPRET: --- " 2dup type ['] evaluate catch dup if \ this is not necessary an error... ." interpret: exception " dup . ." caught" cr
Stefan Reinauer wrote:
Nothing _in_ OpenBIOS should be using client interface methods. If it did, we should fix that as we proceed.
With such a bug I wonder, however, how we could ever boot a client.
Yeah, that was one of my thoughts. Do we have a complete list of clients that have booted from OpenBIOS anywhere?
Attached patch should improve the behavior. It's untested though, possibly additional cleanup is needed after call-method...
Things get very slightly further, but it now throws an exception -21 (method not found):
Evaluating FCode... call-method : exception -21
seek failed
Can't mount root
byte-load: exception caught!
Here's the debug output with your patch applied:
: seek ( 8002000 800 8000 0 ffe4adc8 ) 00000000ffe28030: (") ( 8002000 800 8000 0 ffe4adc8 ffe28040 4 ) 00000000ffe28048: call-method : call-method ( 8002000 800 8000 0 ffe4adc8 ffe28040 4 ) 00000000ffe27db8: dup ( 8002000 800 8000 0 ffe4adc8 ffe28040 4 4 ) 00000000ffe27dc0: 0= ( 8002000 800 8000 0 ffe4adc8 ffe28040 4 0 ) 00000000ffe27dc8: do?branch ( 8002000 800 8000 0 ffe4adc8 ffe28040 4 ) 00000000ffe27e18: dup ( 8002000 800 8000 0 ffe4adc8 ffe28040 4 4 ) 00000000ffe27e20: >r ( 8002000 800 8000 0 ffe4adc8 ffe28040 4 ) 00000000ffe27e28: dup ( 8002000 800 8000 0 ffe4adc8 ffe28040 4 4 ) 00000000ffe27e30: cstrlen ( 8002000 800 8000 0 ffe4adc8 ffe28040 4 0 ) 00000000ffe27e38: rot ( 8002000 800 8000 0 ffe4adc8 4 0 ffe28040 ) 00000000ffe27e40: ?ihandle ( 8002000 800 8000 0 ffe4adc8 4 0 ffe28040 ) 00000000ffe27e48: (lit) ( 8002000 800 8000 0 ffe4adc8 4 0 ffe28040 ffe130d8 ) 00000000ffe27e58: catch ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf ) 00000000ffe27e60: dup ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf ffffffffffffffdf ) 00000000ffe27e68: do?branch ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf ) 00000000ffe27e78: (") ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf ffe27e88 c ) 00000000ffe27e98: type call-method ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf ) 00000000ffe27ea0: r@ ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf 4 ) 00000000ffe27ea8: dup ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf 4 4 ) 00000000ffe27eb0: cstrlen ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf 4 0 ) 00000000ffe27eb8: type ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf ) 00000000ffe27ec0: (") ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf ffe27ed0 c ) 00000000ffe27ee0: type : exception ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf ) 00000000ffe27ee8: dup ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf ffffffffffffffdf ) 00000000ffe27ef0: . -21 ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf ) 00000000ffe27ef8: cr ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf ) 00000000ffe27f00: r> ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf 4 ) 00000000ffe27f08: drop ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf ) 00000000ffe27f10: (semis) [ Finished call-method ] ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf ) 00000000ffe28050: (semis) [ Finished seek ] seek failed
Can't mount root Aborted. 0 >
On first glance, it looks as if cstrlen is returning 0 which is probably stopping the method name from being set correctly.
HTH,
Mark.
Mark Cave-Ayland wrote:
[...] On first glance, it looks as if cstrlen is returning 0 which is probably stopping the method name from being set correctly.
That code too is completely wrong. The arguments for $call-method are ( adr len ihandle ), and it appears to be passing ( ihandle adr len ).
It looks like $call-method is doing the right thing, at least - which means it chokes trying to use the "len" as an ihandle.
So, does call-method in ciface.fs need to be modified to send the arguments to $call-method correctly? Or is it seek in ciface.fs that needs to be corrected?
-Nick
On 2009/11/19 at 03:31, Tarl Neustaedter Tarl.Neustaedter@Sun.COM wrote:
Mark Cave-Ayland wrote:
[...] On first glance, it looks as if cstrlen is returning 0 which is probably stopping the method name from being set correctly.
That code too is completely wrong. The arguments for $call-method are ( adr len ihandle ), and it appears to be passing ( ihandle adr len ).
It looks like $call-method is doing the right thing, at least - which means it chokes trying to use the "len" as an ihandle.
-------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
Nick Couchman wrote:
So, does call-method in ciface.fs need to be modified to send the arguments to $call-method correctly? Or is it seek in ciface.fs that needs to be corrected?
It looks like call-method is correct, it was the call from seek that was wrong.
On 2009/11/19 at 09:13, Tarl Neustaedter Tarl.Neustaedter@Sun.COM wrote:
Nick Couchman wrote:
So, does call-method in ciface.fs need to be modified to send the arguments
to $call-method correctly? Or is it seek in ciface.fs that needs to be corrected?
It looks like call-method is correct, it was the call from seek that was wrong.
So, does it just need to be changed from: " seek" call-method
to: rot swap " seek" call-method
like read and write? It looks like the arguments to seek are a little different from read and write, and when I try changing it to that and rebuilding OpenBIOS, I get exactly the same exception -21.
(Sorry if that's a stupid mistake on my part - I'm very new to Forth, so I'm still trying to wrap my head around it :-).
-Nick
-------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
[...]
So, does it just need to be changed from: " seek" call-method
to: rot swap " seek" call-method
More like " seek" rot call-method
On 2009/11/19 at 09:40, Tarl Neustaedter Tarl.Neustaedter@Sun.COM wrote:
[...]
So, does it just need to be changed from: " seek" call-method
to: rot swap " seek" call-method
More like " seek" rot call-method
Doh!
Well, no more Forth exceptions, but I think Qemu is not liking it, now:
0 > boot cdrom [sparc64] Booting file 'cdrom' with parameters '' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7120 bytes entry point is 0x4000 Evaluating FCode... Unhandled Exception 0x0000000000000034 PC = 0x00000000ffd10de4 NPC = 0x00000000ffd10de8 Stopping execution
-Nick
-------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
On Thu, Nov 19, 2009 at 6:45 PM, Nick Couchman Nick.Couchman@seakr.com wrote:
On 2009/11/19 at 09:40, Tarl Neustaedter Tarl.Neustaedter@Sun.COM wrote:
[...]
So, does it just need to be changed from: " seek" call-method
to: rot swap " seek" call-method
More like " seek" rot call-method
Doh!
Well, no more Forth exceptions, but I think Qemu is not liking it, now:
0 > boot cdrom [sparc64] Booting file 'cdrom' with parameters '' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7120 bytes entry point is 0x4000 Evaluating FCode... Unhandled Exception 0x0000000000000034 PC = 0x00000000ffd10de4 NPC = 0x00000000ffd10de8 Stopping execution
This is unaligned access exception. With GDB you could check if some address matches the PC value.
This is unaligned access exception. With GDB you could check if some address matches the PC value.
(gdb) l *0x00000000ffd10de4 0xffd10de4 is in fetch (../include/openbios/stack.h:34). 29 typedef ucell phandle_t; 30 31 32 33 34 static inline void PUSH(ucell value) { 35 dstack[++dstackcnt] = (value); 36 } 37 static inline void PUSH_xt( xt_t xt ) { PUSH( (ucell)xt ); } 38 static inline void PUSH_ih( ihandle_t ih ) { PUSH( (ucell)ih ); }
So, something about the PUSH function that it doesn't like??
-Nick
-------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
On Thu, Nov 19, 2009 at 9:07 PM, Nick Couchman Nick.Couchman@seakr.com wrote:
This is unaligned access exception. With GDB you could check if some address matches the PC value.
(gdb) l *0x00000000ffd10de4 0xffd10de4 is in fetch (../include/openbios/stack.h:34). 29 typedef ucell phandle_t; 30 31 32 33 34 static inline void PUSH(ucell value) { 35 dstack[++dstackcnt] = (value); 36 } 37 static inline void PUSH_xt( xt_t xt ) { PUSH( (ucell)xt ); } 38 static inline void PUSH_ih( ihandle_t ih ) { PUSH( (ucell)ih ); }
So, something about the PUSH function that it doesn't like??
More likely the address given to fetch was not aligned: static void fetch(void) { const ucell *aaddr = (ucell *)cell2pointer(POP()); PUSH(read_ucell(aaddr)); }
Here QEMU can help, enable DEBUG_PCALL in target-sparc/op_helper.c and recompile. Then run QEMU with -d int and /tmp/qemu.log will contain the register dump at the time of the exception.
More likely the address given to fetch was not aligned: static void fetch(void) { const ucell *aaddr = (ucell *)cell2pointer(POP()); PUSH(read_ucell(aaddr)); }
Here QEMU can help, enable DEBUG_PCALL in target-sparc/op_helper.c and recompile. Then run QEMU with -d int and /tmp/qemu.log will contain the register dump at the time of the exception.
I enabled DEBUG_PCALL as well as DEBUG_UNALIGNED in target-sparc/op_helper.c. Output for qemu was:
0 > boot cdrom [sparc64] Booting file 'cdrom' with parameters '' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7420 bytes entry point is 0x4000 Evaluating FCode... Unaligned access to 0x0000000000000014 from 0x00000000ffd10d9c Unhandled Exception 0x0000000000000034 PC = 0x00000000ffd10de4 NPC = 0x00000000ffd10de8 Stopping execution
and /tmp/qemu.log contains this at the end:
Search PC... 4550: Unaligned Memory Access (v=0034) pc=00000000ffd10de4 npc=00000000ffd10de8 SP=00000000fff10cd1 pc: 00000000ffd10de4 npc: 00000000ffd10de8 General Registers: %g0: 0000000000000000 %g1: 00000000000000b8 %g2: 0000000000000014 %g3: 00000000ffee3d50 %g4: 00000000ffee3000 %g5: 0000000000000041 %g6: 0000000000000000 %g7: 0000000000000000 Current Register Window: %o0: 000000000000003f %o1: 00000000ffe130f0 %o2: 0000000000000018 %o3: 00000000ffee3000 %o4: 00000000ffee3c00 %o5: 0000000000000210 %o6: 00000000fff10cd1 %o7: 00000000ffd0f068 %l0: 00000000ffee3000 %l1: 0000000000000000 %l2: 00000000ffee3000 %l3: 0000000000000000 %l4: 000000000000001a %l5: 0000000000000000 %l6: 0000000000000000 %l7: 0000000000000000 %i0: 00000000ffe1ac30 %i1: 0000000000000200 %i2: 00000000ffe00000 %i3: 0000000000000000 %i4: ffffffffffffffff %i5: 0000000000000018 %i6: 00000000fff10d91 %i7: 00000000ffd126f8
Floating Point Registers: %f00: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f04: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f08: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f12: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f16: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f20: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f24: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f28: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 pstate: 0x00000016 ccr: 0x44 asi: 0x00 tl: 0 fprs: 0 cansave: 6 canrestore: 0 otherwin: 0 wstate 0 cleanwin 7 cwp 4 fsr: 0x00000000
-Nick
-------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
On 11/19/09 8:30 PM, Nick Couchman wrote:
More likely the address given to fetch was not aligned: static void fetch(void) { const ucell *aaddr = (ucell *)cell2pointer(POP()); PUSH(read_ucell(aaddr)); }
Here QEMU can help, enable DEBUG_PCALL in target-sparc/op_helper.c and recompile. Then run QEMU with -d int and /tmp/qemu.log will contain the register dump at the time of the exception.
I enabled DEBUG_PCALL as well as DEBUG_UNALIGNED in target-sparc/op_helper.c. Output for qemu was:
0 > boot cdrom [sparc64] Booting file 'cdrom' with parameters '' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7420 bytes entry point is 0x4000 Evaluating FCode... Unaligned access to 0x0000000000000014 from 0x00000000ffd10d9c Unhandled Exception 0x0000000000000034
This still very much looks like it's using the size of a string as an address somewhere :-(
Stefan
0 > boot cdrom [sparc64] Booting file 'cdrom' with parameters '' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7420 bytes entry point is 0x4000 Evaluating FCode... Unaligned access to 0x0000000000000014 from 0x00000000ffd10d9c Unhandled Exception 0x0000000000000034
This still very much looks like it's using the size of a string as an address somewhere :-(
Stefan
So, in trying to track this down further, I need some more help with the debugger in OpenBIOS. It seems that I have to go through the boot process at least once before I can debug certain things. For example, if I start up qemu and immediately type "debug do-boot", I'm told "could not locate word for debugging ok", and, when I try to boot, it doesn't allow me to step through do-boot - it just continues on. If I let the boot fail once, then try "debug do-boot" it allows me to step through it. Unfortunately, with this most recent error - the Unhandled Exception - the first time I boot is also the last time until I restart Qemu, making it very difficult to make it stop at the correct place in order to debug. So, is there a way that I can load the words before I actually attempt the boot so that OpenBIOS knows where to stop and so that I can track down where this unhandled exception is occurring?
-Nick
-------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
Nick Couchman wrote:
[...] So, in trying to track this down further, I need some more help with the debugger in OpenBIOS. It seems that I have to go through the boot process at least once before I can debug certain things. For example, if I start up qemu and immediately type "debug do-boot", I'm told "could not locate word for debugging ok",
Right. do-boot is defined by the primary bootblocks, the code it reads from blocks 1-15. If you're still booting from the Nevada CD, use boot cdrom -H , which will cause it to halt before it executes do-boot. You'll have to clean up -H from the arguments so you can get past the test for halt?.
I'd suggest:
ok boot cdrom -H ok debug do-boot ok do-boot
(step through until you see the test for halt?, then enter forth and set halt? to false).
On 2009/11/19 at 13:14, Tarl Neustaedter Tarl.Neustaedter@Sun.COM wrote:
Nick Couchman wrote:
[...] So, in trying to track this down further, I need some more help with the
debugger in OpenBIOS. It seems that I have to go through the boot process at least once before I can debug certain things. For example, if I start up qemu and immediately type "debug do-boot", I'm told "could not locate word for debugging ok",
Right. do-boot is defined by the primary bootblocks, the code it reads from blocks 1-15. If you're still booting from the Nevada CD, use boot cdrom -H , which will cause it to halt before it executes do-boot. You'll have to clean up -H from the arguments so you can get past the test for halt?.
I'd suggest:
ok boot cdrom -H ok debug do-boot ok do-boot
(step through until you see the test for halt?, then enter forth and set halt? to false).
Unfortunately the unhandled exception occurs before this check is done:
0 > boot cdrom -H [sparc64] Booting file 'cdrom' with parameters '-H' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7420 bytes entry point is 0x4000 Evaluating FCode... Unaligned access to 0x0000000000000014 from 0x00000000ffd10d9c Unhandled Exception 0x0000000000000034 PC = 0x00000000ffd10de4 NPC = 0x00000000ffd10de8 Stopping execution
And, yes, still booting from sol-nv-b127-sparc-dvd.iso file...
-Nick
-------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
Nick Couchman wrote:
[...] Unfortunately the unhandled exception occurs before this check is done:
Then we've broken something else, because we were getting past that before. We aren't getting to do-boot now.
What boot does is essentially:
<boot-device> open-dev 4000 swap " load" rot $call-method
(it's a bit more complicated, but that's the essence). We seem to be dying in either open-dev or "load" now.
On 2009/11/19 at 13:29, Tarl Neustaedter Tarl.Neustaedter@Sun.COM wrote:
Nick Couchman wrote:
[...] Unfortunately the unhandled exception occurs before this check is done:
Then we've broken something else, because we were getting past that before. We aren't getting to do-boot now.
What boot does is essentially:
<boot-device> open-dev 4000 swap " load" rot $call-method
(it's a bit more complicated, but that's the essence). We seem to be dying in either open-dev or "load" now.
So, using gdb, I've managed to come up with a bit more information - whether it's useful or not is beyond my ability to determine, but here goes...
I set up a breakpoint at fcode_load, because that's the C-based function where it appears to be failing. Once we're in that function, I step through:
Breakpoint 1, fcode_load (filename=0xffec3658 "cdrom") at ../arch/sparc64/fcodeload.c:21 21 if (!file_open(filename)) (gdb) next 25 file_seek(offset); (gdb) 32 switch (fcode_header[0]) { (gdb) 25 file_seek(offset); (gdb) 26 if (lfile_read(&fcode_header, sizeof(fcode_header)) (gdb) 32 switch (fcode_header[0]) { (gdb) 24 for (offset = 0; offset < 16 * 512; offset += 512) { (gdb) 32 switch (fcode_header[0]) { (gdb) 25 file_seek(offset); (gdb) 26 if (lfile_read(&fcode_header, sizeof(fcode_header)) (gdb) 32 switch (fcode_header[0]) { (gdb) 52 printf("Loading FCode image...\n"); (gdb) 50 start = 0x4000; (gdb) 47 size = (fcode_header[4] << 24) | (fcode_header[5] << 16) | (gdb) 52 printf("Loading FCode image...\n"); (gdb) 47 size = (fcode_header[4] << 24) | (fcode_header[5] << 16) | (gdb) 52 printf("Loading FCode image...\n"); (gdb) 47 size = (fcode_header[4] << 24) | (fcode_header[5] << 16) | (gdb) 52 printf("Loading FCode image...\n"); (gdb) 47 size = (fcode_header[4] << 24) | (fcode_header[5] << 16) | (gdb) 52 printf("Loading FCode image...\n"); (gdb) 54 file_seek(offset + sizeof(fcode_header)); (gdb) 56 if ((unsigned long)lfile_read((void *)start, size) != size) { (gdb) 61 debug("Loaded %lu bytes\n", size); (gdb) 70 retval = 0; (gdb) 61 debug("Loaded %lu bytes\n", size); (gdb) 63 debug("entry point is %#lx\n", start); (gdb) 64 printf("Evaluating FCode...\n"); (gdb) 35 dstack[++dstackcnt] = (value); (gdb) 68 fword("byte-load"); (gdb) 35 dstack[++dstackcnt] = (value); (gdb) 68 fword("byte-load"); (gdb) 35 dstack[++dstackcnt] = (value); (gdb) 68 fword("byte-load"); (gdb) 35 dstack[++dstackcnt] = (value); (gdb) 68 fword("byte-load"); (gdb) 35 dstack[++dstackcnt] = (value); (gdb) 68 fword("byte-load"); (gdb) 35 dstack[++dstackcnt] = (value); (gdb) 68 fword("byte-load"); (gdb) 35 dstack[++dstackcnt] = (value); (gdb) 68 fword("byte-load"); (gdb) 35 dstack[++dstackcnt] = (value); (gdb) 68 fword("byte-load"); (gdb)
That last entry - fword("byte-load"); - is where I get the Unhandled Exception in Qemu. So, I also set fcode-verbose to true and ran it, again - the output is very, very long, so I won't include it in the e-mail, but the last few lines look like this:
5b82 : b(:) [ 0xb7 ] 5b84 : (compile) parse-bootargs [ 0x8cf ] 5b86 : (compile) halt? [ 0x8ce ] 5b87 : (compile) b?branch [ 0x14 ] (offset) 1d 5b8a : (compile) b(") [ 0x12 ] (const) Halted with -H flag. 5ba1 : (compile) type [ 0x90 ] 5ba2 : (compile) cr [ 0x92 ] 5ba3 : (compile) exit [ 0x33 ] 5ba4 : (compile) b(>resolve) [ 0xb2 ] 5ba6 : (compile) get-bootdev [ 0x8a0 ] 5ba8 : (compile) load-pkg [ 0x89f ] 5baa : (compile) mount-root [ 0x8a1 ] 5bac : (compile) zflag? [ 0x8c4 ] 5bae : (compile) nested? [ 0x892 ] 5baf : (compile) invert [ 0x26 ] 5bb0 : (compile) and [ 0x23 ] 5bb1 : (compile) b?branch [ 0x14 ] (offset) 7 5bb5 : (compile) fs-name$ [ 0x8c6 ] 5bb7 : (compile) open-zfs-fs [ 0x8c7 ] 5bb8 : (compile) b(>resolve) [ 0xb2 ] 5bba : (compile) load-file [ 0x8e4 ] 5bbc : (compile) setup-props [ 0x8e5 ] 5bbe : (compile) exec-file [ 0x8e6 ] 5bbf : (compile) b(;) [ 0xc2 ] 5bc0 : 0 [ 0xa5 ] 5bc1 : b(to) [ 0xc3 ] 5bc5 : do-boot [ 0x8e7 ] Unaligned access to 0x0000000000000014 from 0x00000000ffd10d9c Unhandled Exception 0x0000000000000034
Let me know if you want me to attach the full output, or if this is helpful at all...
-Nick
-------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
Nick Couchman wrote:
[...] That last entry - fword("byte-load"); - is where I get the Unhandled Exception in Qemu. So, I also set fcode-verbose to true and ran it, again - the output is very, very long, so I won't include it in the e-mail, but the last few lines look like this:
5b82 : b(:) [ 0xb7 ] 5b84 : (compile) parse-bootargs [ 0x8cf ] 5b86 : (compile) halt? [ 0x8ce ] 5b87 : (compile) b?branch [ 0x14 ] (offset) 1d 5b8a : (compile) b(") [ 0x12 ] (const) Halted with -H flag. 5ba1 : (compile) type [ 0x90 ] 5ba2 : (compile) cr [ 0x92 ] 5ba3 : (compile) exit [ 0x33 ] 5ba4 : (compile) b(>resolve) [ 0xb2 ] 5ba6 : (compile) get-bootdev [ 0x8a0 ] 5ba8 : (compile) load-pkg [ 0x89f ] 5baa : (compile) mount-root [ 0x8a1 ] 5bac : (compile) zflag? [ 0x8c4 ] 5bae : (compile) nested? [ 0x892 ] 5baf : (compile) invert [ 0x26 ] 5bb0 : (compile) and [ 0x23 ] 5bb1 : (compile) b?branch [ 0x14 ] (offset) 7 5bb5 : (compile) fs-name$ [ 0x8c6 ] 5bb7 : (compile) open-zfs-fs [ 0x8c7 ] 5bb8 : (compile) b(>resolve) [ 0xb2 ] 5bba : (compile) load-file [ 0x8e4 ] 5bbc : (compile) setup-props [ 0x8e5 ] 5bbe : (compile) exec-file [ 0x8e6 ] 5bbf : (compile) b(;) [ 0xc2 ]
Yup. Above is compiling all the FCode from the primary bootblocks. That's good - it managed to read the initial blocks and get through evaluating them. (I fear it's going to mush itself together in the email, however. Ah well).
5bc0 : 0 [ 0xa5 ] 5bc1 : b(to) [ 0xc3 ] 5bc5 : do-boot [ 0x8e7 ] Unaligned access to 0x0000000000000014 from 0x00000000ffd10d9c Unhandled Exception 0x0000000000000034
Let me know if you want me to attach the full output, or if this is helpful at all...
O.k. - the above says it *did* manage to execute the do-boot. Since you've given it the -H flag, it should have stopped about four tokens in. The only thing it could be executing was "parse-bootargs", and that hadn't failed on us before.
Are you sure you're giving the command "boot cdrom -H" (capital H)?
5bc0 : 0 [ 0xa5 ] 5bc1 : b(to) [ 0xc3 ] 5bc5 : do-boot [ 0x8e7 ] Unaligned access to 0x0000000000000014 from 0x00000000ffd10d9c Unhandled Exception 0x0000000000000034
Let me know if you want me to attach the full output, or if this is helpful
at all...
O.k. - the above says it *did* manage to execute the do-boot. Since you've given it the -H flag, it should have stopped about four tokens in. The only thing it could be executing was "parse-bootargs", and that hadn't failed on us before.
Are you sure you're giving the command "boot cdrom -H" (capital H)?
Yep, verified that I'm executing "boot cdrom -H" with where "-H" is minus-H (capital H). I get this output at boot:
0 > true to ?fcode-verbose ok 0 > boot cdrom -H [sparc64] Booting file 'cdrom' with parameters '-H' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7120 bytes entry point is 0x4000 Evaluating FCode...
byte-load: evaluating fcode at 0x4000 fcode-table at 0xffe4a818 4000 : offset16 [ 0xcc ] 4001 : named-token [ 0xb6 ] (const) fs-pkg$ (fcode#) 800 ...
-Nick
-------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
Whuups. Something else I just noticed:
5bbf : (compile) b(;) [ 0xc2 ] 5bc0 : 0 [ 0xa5 ] 5bc1 : b(to) [ 0xc3 ] 5bc5 : do-boot [ 0x8e7 ]
We seem to be missing a word. After the compile is done, we should have "0 to my-self" followed by "do-boot". I don't see the my-self there. If it's just a peculiarity of the debugger to not show the destination, no problem - but if he tries to do "0 to do-boot", that will barf.
On 2009/11/19 at 13:59, Tarl Neustaedter Tarl.Neustaedter@Sun.COM wrote:
Whuups. Something else I just noticed:
5bbf : (compile) b(;) [ 0xc2 ] 5bc0 : 0 [ 0xa5 ] 5bc1 : b(to) [ 0xc3 ] 5bc5 : do-boot [ 0x8e7 ]
We seem to be missing a word. After the compile is done, we should have "0 to my-self" followed by "do-boot". I don't see the my-self there. If it's just a peculiarity of the debugger to not show the destination, no problem - but if he tries to do "0 to do-boot", that will barf.
Not sure...here are some other bits of output with "(to)":
(offset) e 584b : (compile) fs-name [ 0x8c5 ] 584c : (compile) swap [ 0x49 ] 584d : (compile) move [ 0x78 ] 584e : (compile) -1 [ 0xa4 ] 584f : (compile) b(to) [ 0xc3 ] 5852 : (compile) bbranch [ 0x13 ] (offset) 5
5a8a : b(:) [ 0xb7 ] 5a8b : (compile) dup [ 0x47 ] 5a8c : (compile) b(to) [ 0xc3 ] 5a90 : (compile) [ 0x8d7 ] 5a91 : (compile) b(lit) [ 0x10 ] 5a96 : (compile) <> [ 0x3d ] 5a97 : (compile) b?branch [ 0x14 ]
5b0d : (compile) die [ 0x809 ] 5b0e : (compile) b(>resolve) [ 0xb2 ] 5b0f : (compile) b(to) [ 0xc3 ] 5b12 : (compile) swap [ 0x49 ] 5b13 : (compile) b(to) [ 0xc3 ] 5b16 : (compile) swap [ 0x49 ] 5b18 : (compile) [ 0x898 ] 5b19 : (compile) b(;) [ 0xc2 ]
But I don't know if those are similar cases or not - a couple of them seem to involve memory addresses...
-Nick
-------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
Not sure...here are some other bits of output with "(to)":
O.k. - it appears this debugger simply doesn't show the destination. So we're good there - it was a red herring.
If we're still getting to "seek" and blowing up, that says that for whatever reason the -H isn't getting parsed correctly. Not sure what to do about that. If you get desperate, you might try to patch out the "do-boot" call at the end of the FCode on the ISO image, so it does the load but doesn't execute the do-boot.
Don't know what else to suggest.
On 2009/11/19 at 14:30, Tarl Neustaedter Tarl.Neustaedter@Sun.COM wrote:
Not sure...here are some other bits of output with "(to)":
O.k. - it appears this debugger simply doesn't show the destination. So we're good there - it was a red herring.
If we're still getting to "seek" and blowing up, that says that for whatever reason the -H isn't getting parsed correctly. Not sure what to do about that. If you get desperate, you might try to patch out the "do-boot" call at the end of the FCode on the ISO image, so it does the load but doesn't execute the do-boot.
Don't know what else to suggest.
Not getting desperate, just think it would be very cool to get qemu-system-sparc64 to boot Solaris correctly, and willing to do some work to try to make that happen. I'll keep poking around and get more familiar with the debuggers, and wait and see if any of the other folks on the list have any suggestions. Thanks for all your help, Tarl!
Just for kicks I tried to use "-h" (lower-case "h"), instead, and that doesn't parse any better than -H.
-Nick
-------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
Stefan Reinauer wrote:
This still very much looks like it's using the size of a string as an address somewhere :-(
Stefan
Yeah, that was my take on it too. I spent a bit of time taking your patch and playing with call-method, and managed to get a bit further; at least stepping through read in the debugger showed something that looked like a Milax CDROM sector. With the attached patch applied to current SVN, I now get the following:
OpenBIOS for Sparc64 Configuration device id QEMU version 1 machine id 0 CPUs: 1 x SUNW,UltraSPARC-II UUID: 00000000-0000-0000-0000-000000000000 Welcome to OpenBIOS v1.0 built on Nov 19 2009 21:42 Type 'help' for detailed information
[sparc64] Booting file 'cdrom' with parameters '' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7084 bytes entry point is 0x4000 Evaluating FCode... reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word.
Can't open boot_archive
byte-load: exception caught!
0 >
ATB,
Mark.
Mark Cave-Ayland wrote:
Stefan Reinauer wrote:
This still very much looks like it's using the size of a string as an address somewhere :-(
Stefan
Yeah, that was my take on it too. I spent a bit of time taking your patch and playing with call-method, and managed to get a bit further; at least stepping through read in the debugger showed something that looked like a Milax CDROM sector. With the attached patch applied to current SVN, I now get the following:
OpenBIOS for Sparc64 Configuration device id QEMU version 1 machine id 0 CPUs: 1 x SUNW,UltraSPARC-II UUID: 00000000-0000-0000-0000-000000000000 Welcome to OpenBIOS v1.0 built on Nov 19 2009 21:42 Type 'help' for detailed information
[sparc64] Booting file 'cdrom' with parameters '' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7084 bytes entry point is 0x4000 Evaluating FCode... reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word.
[..]
Awesome! Can you find out the fcode numbers of those words? Sounds like the Solaris bootloader is expecting some Fcode extensions that openbios does not (yet) implement (such as 64bit extensions?)
Stefan
Stefan Reinauer wrote:
Awesome! Can you find out the fcode numbers of those words? Sounds like the Solaris bootloader is expecting some Fcode extensions that openbios does not (yet) implement (such as 64bit extensions?)
Stefan
No worries. The offending FCodes are:
0x246 - x@ ( oaddr -- o ) Fetch octlet from an octlet aligned address
0x247 - x! ( o oaddr -- ) Store octlet to an octlet aligned address
Do you reckon you could come up with an implementation of these relatively easily?
ATB,
Mark.
Mark Cave-Ayland wrote:
[...] No worries. The offending FCodes are:
0x246 - x@ ( oaddr -- o ) Fetch octlet from an octlet aligned address
0x247 - x! ( o oaddr -- ) Store octlet to an octlet aligned address
Oops. Yup, during the Solaris 10->Solaris Nevada rewrite, they added a bunch of x@ and x! throughout the code.
On 11/20/09 10:09 AM, Mark Cave-Ayland wrote:
No worries. The offending FCodes are:
0x246 - x@ ( oaddr -- o ) Fetch octlet from an octlet aligned address
0x247 - x! ( o oaddr -- ) Store octlet to an octlet aligned address
Do you reckon you could come up with an implementation of these relatively easily?
Please update to r617... I added a (pretty much untested, but it does not break things that were not broken before) implementation of the IEEE Draft Std P1275.6/D5 64 Bit Extensions and tried adding some (probably rarely used) words that were unimplemented in forth/device/others.fs
Please give it a shot to see if it changes things
Stefan
Stefan Reinauer wrote:
Do you reckon you could come up with an implementation of these relatively easily?
Please update to r617... I added a (pretty much untested, but it does not break things that were not broken before) implementation of the IEEE Draft Std P1275.6/D5 64 Bit Extensions and tried adding some (probably rarely used) words that were unimplemented in forth/device/others.fs
Ooooh - nice work! You're obviously more fluent in Forth than most people here ;)
Please give it a shot to see if it changes things
Yes it does, but with my patch from yesterday also applied, now it looks as if something else is broken:
OpenBIOS for Sparc64 Configuration device id QEMU version 1 machine id 0 CPUs: 1 x SUNW,UltraSPARC-II UUID: 00000000-0000-0000-0000-000000000000 Welcome to OpenBIOS v1.0 built on Nov 20 2009 15:53 Type 'help' for detailed information
[sparc64] Booting file 'cdrom' with parameters '' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7084 bytes entry point is 0x4000 Evaluating FCode... Unhandled Exception 0x00000004ff702000 PC = 0x00000000ffd102b8 NPC = 0x00000000ffd102bc Stopping execution
BTW is there any chance you could review and apply the patch for debugger here: http://lists.openbios.org/pipermail/openbios/2009-November/004063.html.
Many thanks,
Mark.
On 11/20/09 5:02 PM, Mark Cave-Ayland wrote: Please give it a shot to see if it changes things
Yes it does, but with my patch from yesterday also applied, now it looks as if something else is broken:
OpenBIOS for Sparc64 Configuration device id QEMU version 1 machine id 0 CPUs: 1 x SUNW,UltraSPARC-II UUID: 00000000-0000-0000-0000-000000000000 Welcome to OpenBIOS v1.0 built on Nov 20 2009 15:53 Type 'help' for detailed information
[sparc64] Booting file 'cdrom' with parameters '' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7084 bytes entry point is 0x4000 Evaluating FCode... Unhandled Exception 0x00000004ff702000
Sorry to say, it seems I badly implemented the most important word unaligned-x!
Going to check in a fix in a few minutes (this time I better test it)
BTW is there any chance you could review and apply the patch for debugger here: http://lists.openbios.org/pipermail/openbios/2009-November/004063.html.
I'll have a look...
Stefan
Stefan Reinauer wrote:
[...] Sorry to say, it seems I badly implemented the most important word unaligned-x!
I was wondering about that.... Wouldn't it have been easier to do something like:
r xlsplit r@ l! r> la+ l!
(Of course, I haven't tested that, either)
Yes it does, but with my patch from yesterday also applied, now it looks as if something else is broken:
OpenBIOS for Sparc64 Configuration device id QEMU version 1 machine id 0 CPUs: 1 x SUNW,UltraSPARC-II UUID: 00000000-0000-0000-0000-000000000000 Welcome to OpenBIOS v1.0 built on Nov 20 2009 15:53 Type 'help' for detailed information
[sparc64] Booting file 'cdrom' with parameters '' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7084 bytes entry point is 0x4000 Evaluating FCode... Unhandled Exception 0x00000004ff702000 PC = 0x00000000ffd102b8 NPC = 0x00000000ffd102bc Stopping execution
After updating to r624, I get the following:
0 > boot cdrom [sparc64] Booting file 'cdrom' with parameters '' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7120 bytes entry point is 0x4000 Evaluating FCode... Unhandled Exception 0x9000280200000000 PC = 0x00000000ffd0f05c NPC = 0x00000000ffd0f060 Stopping execution
When I use gdb to trace down the source of the exception, it looks like this:
(gdb) l *0x00000000ffd0f05c 0xffd0f05c is in enterforth (../kernel/internal.c:71). 66 #define dbg_internal_printk( a... ) printk( a ) 67 #endif 68 69 70 static inline void processxt(ucell xt) 71 { 72 void (*tokenp) (void); 73 74 dbg_interp_printk("processxt: pc=%x, xt=%x\n", PC, xt); 75 tokenp = words[xt];
-Nick
-------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.