Re: [OpenBIOS] Sparc64 OpenBIOS

List overview All Threads
Download

newer

older

r625 -...

r624 -...

Nick Couchman

18 Nov 2009 18 Nov '09

4:15 p.m.

...

...
...
On 2009/11/17 at 10:37, Tarl Neustaedter Tarl.Neustaedter@Sun.COM wrote:

[...]

...
Based on that output, it's hard for me to tell what about "open-package'"

failed

...
Yup. What I spot:

: mount-root ( Empty ) 00000000ffe35d28: boot-dev$ ( ffe357a8 6 ) 00000000ffe35d30: fs-pkg$ ( ffe357a8 6 ffe317c0 10 ) 00000000ffe35d38: $open-package : $open-package ( ffe357a8 6 ffe317c0 10 )

It looks like fs-pkg$ names the package it's opening. I'm guessing that's hsfs-file-system, but you should probably verify that's what is in that string (do a "2dup type" before calling $open-package). If that's correct, then rather than descending into open-package, you probably want to put a breakpoint at "open" in /packages/hsfs-file-system, which is where that will end up after much wandering around. Then proceed from there - you really don't want to trace through instance creation.

Okay, I was able to do a "2dup type" and verify that hsfs-file-system is the name of the file that it's trying to open.

Again, forgive my ignorance (being new to this whole development process), but what's the best way to insert a breakpoint where you've suggested?

Thanks! -Nick

...

The reason you weren't able to see all that (which would have been overwhelming) is a bit further down:

00000000ffe135f0: (lit) ( 1 ffe2fe38 ffe4b1d8 ffe4b548 ffe135e0 4 ffe4b558 ffe13310 ) 00000000ffe13600: catch ( 1 ffe2fe38 ffe4b1d8 ffe4b548 ffffffffffffffff 0 )

That's basically a subroutine call to ffe13310, which it didn't show.

-------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.

Show replies by date

Tarl Neustaedter

18 Nov 18 Nov

4:29 p.m.

New subject: Sparc64 OpenBIOS

Nick Couchman wrote:

...

[...] Okay, I was able to do a "2dup type" and verify that hsfs-file-system is the name of the file that it's trying to open.

Again, forgive my ignorance (being new to this whole development process), but what's the best way to insert a breakpoint where you've suggested?

At a guess,

ok dev /packages/hsfs-file-system ok debug open ok dev

Mark Cave-Ayland

6:37 p.m.

New subject: Sparc64 OpenBIOS

Tarl Neustaedter wrote:

...

At a guess,

ok dev /packages/hsfs-file-system ok debug open ok dev

That's correct, i.e. you need to change to package you wish to debug and then invoke "debug <foo>" on the word you wish to step through.

Note that one thing I have found with the debugger is that you may need to manually add breakpoints for methods called during package opening using the above method, rather than being able to use "U" and "D" interactively.

AFAICT this is because the debugger needs to locate the start and end of a word in the wordlist, and if the package open fails then the new wordlist isn't set, and hence the word can't be located and added to the debug word list.

HTH,

Mark.

-- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs

Nick Couchman

6:55 p.m.

New subject: Sparc64 OpenBIOS

...

...
...
On 2009/11/18 at 10:37, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk

wrote:

...

Tarl Neustaedter wrote:

...
At a guess,

ok dev /packages/hsfs-file-system ok debug open ok dev

That's correct, i.e. you need to change to package you wish to debug and then invoke "debug <foo>" on the word you wish to step through.

Note that one thing I have found with the debugger is that you may need to manually add breakpoints for methods called during package opening using the above method, rather than being able to use "U" and "D" interactively.

AFAICT this is because the debugger needs to locate the start and end of a word in the wordlist, and if the package open fails then the new wordlist isn't set, and hence the word can't be located and added to the debug word list.

(I apologize in advance - the output is kind of long, but I wanted to make sure I posted everything relevant...)

Here's the results of the effort to debug "open":

0 > boot [sparc64] Booting file 'cdrom' with parameters '' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7120 bytes entry point is 0x4000 Evaluating FCode... read-file isn't unique.

: open ( ffffffffffffffff 1 0 ffffffffffffffff 0 0 0 ffe0b7c0 0 0 0 0 0 ffe3a120 ) 00000000ffe33c78: my-args ( ffffffffffffffff 1 0 ffffffffffffffff 0 0 0 ffe0b7c0 0 0 0 0 0 ffe3a120 ffe493c0 7 ) 00000000ffe33c80: dev-open ( ffffffffffffffff 1 0 ffffffffffffffff 0 0 0 ffe0b7c0 0 0 0 0 0 ffe3a120 ffe4c230 ) 00000000ffe33c88: dup ( ffffffffffffffff 1 0 ffffffffffffffff 0 0 0 ffe0b7c0 0 0 0 0 0 ffe3a120 ffe4c230 ffe4c230 ) 00000000ffe33c90: 0= ( ffffffffffffffff 1 0 ffffffffffffffff 0 0 0 ffe0b7c0 0 0 0 0 0 ffe3a120 ffe4c230 0 ) 00000000ffe33c98: do?branch ( ffffffffffffffff 1 0 ffffffffffffffff 0 0 0 ffe0b7c0 0 0 0 0 0 ffe3a120 ffe4c230 ) 00000000ffe33cb0: (lit) ( ffffffffffffffff 1 0 ffffffffffffffff 0 0 0 ffe0b7c0 0 0 0 0 0 ffe3a120 ffe4c230 ffe314e0 ) 00000000ffe33cc0: (to) ( ffffffffffffffff 1 0 ffffffffffffffff 0 0 0 ffe0b7c0 0 0 0 0 0 ffe3a120 ) 00000000ffe33cc8: initialize seek failed

Can't mount root

byte-load: exception caught! ok

So, open is failing during "initialize" - if I debug that:

0 > do-boot : open ( Empty ) 00000000ffe33c78: my-args ( ffe493c0 7 ) 00000000ffe33c80: dev-open ( ffe4c7e8 ) 00000000ffe33c88: dup ( ffe4c7e8 ffe4c7e8 ) 00000000ffe33c90: 0= ( ffe4c7e8 0 ) 00000000ffe33c98: do?branch ( ffe4c7e8 ) 00000000ffe33cb0: (lit) ( ffe4c7e8 ffe314e0 ) 00000000ffe33cc0: (to) ( Empty ) 00000000ffe33cc8: initialize : initialize ( Empty ) 00000000ffe33448: /sector ( 800 ) 00000000ffe33450: mem-alloc ( 8004000 ) 00000000ffe33458: (lit) ( 8004000 ffe31508 ) 00000000ffe33468: (to) ( Empty ) 00000000ffe33470: get-vol-desc seek failed

Can't mount root Aborted.

get-vol-desc now appears to be the culprit:

0 > do-boot : open ( Empty ) 00000000ffe33c78: my-args ( ffe493c0 7 ) 00000000ffe33c80: dev-open ( ffe4d358 ) 00000000ffe33c88: dup ( ffe4d358 ffe4d358 ) 00000000ffe33c90: 0= ( ffe4d358 0 ) 00000000ffe33c98: do?branch ( ffe4d358 ) 00000000ffe33cb0: (lit) ( ffe4d358 ffe314e0 ) 00000000ffe33cc0: (to) ( Empty ) 00000000ffe33cc8: initialize : initialize ( Empty ) 00000000ffe33448: /sector ( 800 ) 00000000ffe33450: mem-alloc ( 8008000 ) 00000000ffe33458: (lit) ( 8008000 ffe31508 ) 00000000ffe33468: (to) ( Empty ) 00000000ffe33470: get-vol-desc : get-vol-desc ( Empty ) 00000000ffe318b8: vol-desc ( 8008000 ) 00000000ffe318c0: /sector ( 8008000 800 ) 00000000ffe318c8: vol-desc-sector# ( 8008000 800 10 ) 00000000ffe318d0: /sector ( 8008000 800 10 800 ) 00000000ffe318d8: * ( 8008000 800 8000 ) 00000000ffe318e0: dev-ih ( 8008000 800 8000 ffe4d358 ) 00000000ffe318e8: read-disk seek failed

Can't mount root Aborted.

read-disk:

0 > do-boot : open ( Empty ) 00000000ffe33c78: my-args ( ffe493c0 7 ) 00000000ffe33c80: dev-open ( ffe4d910 ) 00000000ffe33c88: dup ( ffe4d910 ffe4d910 ) 00000000ffe33c90: 0= ( ffe4d910 0 ) 00000000ffe33c98: do?branch ( ffe4d910 ) 00000000ffe33cb0: (lit) ( ffe4d910 ffe314e0 ) 00000000ffe33cc0: (to) ( Empty ) 00000000ffe33cc8: initialize : initialize ( Empty ) 00000000ffe33448: /sector ( 800 ) 00000000ffe33450: mem-alloc ( 800a000 ) 00000000ffe33458: (lit) ( 800a000 ffe31508 ) 00000000ffe33468: (to) ( Empty ) 00000000ffe33470: get-vol-desc : get-vol-desc ( Empty ) 00000000ffe318b8: vol-desc ( 800a000 ) 00000000ffe318c0: /sector ( 800a000 800 ) 00000000ffe318c8: vol-desc-sector# ( 800a000 800 10 ) 00000000ffe318d0: /sector ( 800a000 800 10 800 ) 00000000ffe318d8: * ( 800a000 800 8000 ) 00000000ffe318e0: dev-ih ( 800a000 800 8000 ffe4d910 ) 00000000ffe318e8: read-disk : read-disk seek failed

Can't mount root Aborted.

So, for some reason, it cannot step into read-disk for debugging. If I do "see read-disk":

0 > see read-disk : read-disk dup >r 0 swap cif-seek if " seek failed" die tuck swap r> cif-read <> if " read failed" die ; ok

And, all of these seem to be primitive words - e.g. swap, cif-seek, etc., cannot be debugged. I would guess "cif-seek" is where it's failing, which looks like this:

0 > see cif-seek defer cif-seek is seek

And I can't seem to go any deeper than that into debugging. I'm guessing at this point maybe this is where it gets to be a Qemu issue - that there's something about the seek command in the hardware emulation itself that's failing?

Thanks, again, everyone, for all the help!

-Nick

...

HTH,

Mark.

Tarl Neustaedter

7:24 p.m.

New subject: Sparc64 OpenBIOS

Nick Couchman wrote:

...

[...] So, for some reason, it cannot step into read-disk for debugging. If I do "see read-disk":

0 > see read-disk : read-disk dup >r 0 swap cif-seek if " seek failed" die tuck swap r> cif-read <> if " read failed" die ; ok

And, all of these seem to be primitive words - e.g. swap, cif-seek, etc., cannot be debugged. I would guess "cif-seek" is where it's failing, which looks like this:

These are part of the forth kernel, where it does file handling operations - they eventually end up generating reads to the HBA.

Where you are in the code itself - line 209 of:

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/psm/stand/boot...

So we're doing the first cif-seek, which is the client-interface seek. See section 6.3.2.3 in IEEE 1275. That causes the disk to be read so you can cache disk accesses. The question is whether the arguments we are sending are reasonable. What we saw:

00000000ffe318e0: dev-ih ( 8008000 800 8000 ffe4d358 ) 00000000ffe318e8: read-disk

Those arguments we see on the stack are adr, len, off, ihandle. ffe4d358 is a reasonable address for an ihandle. 0x8000 is a reasonable byte offset into the disk to read - and that's all that seek takes (the adr, len get used later by the cif-read, which we don't reach). So somehow, trying to byte offset 0x8000 on the disk is failing - don't know why.

Nick Couchman

8:39 p.m.

New subject: Sparc64 OpenBIOS

...

...
...
On 2009/11/18 at 11:24, Tarl Neustaedter Tarl.Neustaedter@sun.com wrote:

Nick Couchman wrote:

...
[...] So, for some reason, it cannot step into read-disk for debugging. If I do

"see read-disk":

...
0 > see read-disk : read-disk dup >r 0 swap cif-seek if " seek failed" die tuck swap r> cif-read <> if " read failed" die ; ok

And, all of these seem to be primitive words - e.g. swap, cif-seek, etc.,

cannot be debugged. I would guess "cif-seek" is where it's failing, which looks like this:

...
These are part of the forth kernel, where it does file handling operations - they eventually end up generating reads to the HBA.

Where you are in the code itself - line 209 of:

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/psm/stand/boot... lks/common/util.fth

So we're doing the first cif-seek, which is the client-interface seek. See section 6.3.2.3 in IEEE 1275. That causes the disk to be read so you can cache disk accesses. The question is whether the arguments we are sending are reasonable. What we saw:

00000000ffe318e0: dev-ih ( 8008000 800 8000 ffe4d358 ) 00000000ffe318e8: read-disk

Those arguments we see on the stack are adr, len, off, ihandle. ffe4d358 is a reasonable address for an ihandle. 0x8000 is a reasonable byte offset into the disk to read - and that's all that seek takes (the adr, len get used later by the cif-read, which we don't reach). So somehow, trying to byte offset 0x8000 on the disk is failing - don't know why.

Well, I thought maybe I could use a SCSI-based CD-ROM in Qemu and see if it was the HBA emulation code causing the problem, but qemu doesn't seem to want to boot with a SCSI controller:

OpenBIOS for Sparc64 Cannot manage 'SCSI bus controller' PCI device type '<NULL>': 1000 12 (1 0 0) Segmentation fault

-Nick

Blue Swirl

8:45 p.m.

New subject: Sparc64 OpenBIOS

On Wed, Nov 18, 2009 at 9:39 PM, Nick Couchman Nick.Couchman@seakr.com wrote:

...

...
...
...
On 2009/11/18 at 11:24, Tarl Neustaedter Tarl.Neustaedter@sun.com wrote:

Nick Couchman wrote:

...
[...] So, for some reason, it cannot step into read-disk for debugging. If I do

"see read-disk":

...
0 > see read-disk : read-disk dup >r 0 swap cif-seek if " seek failed" die tuck swap r> cif-read <> if " read failed" die ; ok

And, all of these seem to be primitive words - e.g. swap, cif-seek, etc.,

cannot be debugged. I would guess "cif-seek" is where it's failing, which looks like this:

...
These are part of the forth kernel, where it does file handling operations - they eventually end up generating reads to the HBA.

Where you are in the code itself - line 209 of:

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/psm/stand/boot... lks/common/util.fth

So we're doing the first cif-seek, which is the client-interface seek. See section 6.3.2.3 in IEEE 1275. That causes the disk to be read so you can cache disk accesses. The question is whether the arguments we are sending are reasonable. What we saw:

00000000ffe318e0: dev-ih ( 8008000 800 8000 ffe4d358 ) 00000000ffe318e8: read-disk

Those arguments we see on the stack are adr, len, off, ihandle. ffe4d358 is a reasonable address for an ihandle. 0x8000 is a reasonable byte offset into the disk to read - and that's all that seek takes (the adr, len get used later by the cif-read, which we don't reach). So somehow, trying to byte offset 0x8000 on the disk is failing - don't know why.

Well, I thought maybe I could use a SCSI-based CD-ROM in Qemu and see if it was the HBA emulation code causing the problem, but qemu doesn't seem to want to boot with a SCSI controller:

OpenBIOS for Sparc64 Cannot manage 'SCSI bus controller' PCI device type '<NULL>': 1000 12 (1 0 0) Segmentation fault

It's OpenBIOS, there is no driver for the HBA. There's a driver for ESP (Sparc32), but I don't think there is a PCI card with ESP chipset.

Mark Cave-Ayland

9:53 p.m.

New subject: Sparc64 OpenBIOS

Tarl Neustaedter wrote:

...

So we're doing the first cif-seek, which is the client-interface seek. See section 6.3.2.3 in IEEE 1275. That causes the disk to be read so you can cache disk accesses. The question is whether the arguments we are sending are reasonable. What we saw:

00000000ffe318e0: dev-ih ( 8008000 800 8000 ffe4d358 ) 00000000ffe318e8: read-disk

Those arguments we see on the stack are adr, len, off, ihandle. ffe4d358 is a reasonable address for an ihandle. 0x8000 is a reasonable byte offset into the disk to read - and that's all that seek takes (the adr, len get used later by the cif-read, which we don't reach). So somehow, trying to byte offset 0x8000 on the disk is failing - don't know why.

Here's what I get on my Milax CD image here:

: read-disk ( 8002000 800 8000 ffe4adc8 ) 00000000ffe31340: dup ( 8002000 800 8000 ffe4adc8 ffe4adc8 ) 00000000ffe31348: >r ( 8002000 800 8000 ffe4adc8 ) 00000000ffe31350: 0 ( 8002000 800 8000 ffe4adc8 0 ) 00000000ffe31358: swap ( 8002000 800 8000 0 ffe4adc8 ) 00000000ffe31360: cif-seek : seek ( 8002000 800 8000 0 ffe4adc8 ) 00000000ffe280d0: swap ( 8002000 800 8000 ffe4adc8 0 ) 00000000ffe280d8: rot ( 8002000 800 ffe4adc8 0 8000 ) 00000000ffe280e0: dup ( 8002000 800 ffe4adc8 0 8000 8000 ) 00000000ffe280e8: ihandle>phandle ( 8002000 800 ffe4adc8 0 8000 0 ) 00000000ffe280f0: (") ( 8002000 800 ffe4adc8 0 8000 0 ffe28100 4 ) 00000000ffe28108: rot ( 8002000 800 ffe4adc8 0 8000 ffe28100 4 0 ) 00000000ffe28110: find-method ( 8002000 800 ffe4adc8 0 8000 0 ) 00000000ffe28118: do?branch ( 8002000 800 ffe4adc8 0 8000 ) 00000000ffe28148: 3drop ( 8002000 800 ) 00000000ffe28150: -1 ( 8002000 800 ffffffffffffffff ) 00000000ffe28158: (semis) [ Finished seek ] ( 8002000 800 ffffffffffffffff ) 00000000ffe31368: do?branch ( 8002000 800 ) 00000000ffe31378: (") ( 8002000 800 ffe31388 b ) 00000000ffe31398: die seek failed

Can't mount root Aborted.

Hmmm it looks to me as if the client interface seek word is expecting the arguments in a different order - I would expect ihandle>phandle to be executed on ffe4adc8, not on 0.

HTH,

Mark.

Mark Cave-Ayland

10:01 p.m.

New subject: Sparc64 OpenBIOS

Mark Cave-Ayland wrote:

...

Hmmm it looks to me as if the client interface seek word is expecting the arguments in a different order - I would expect ihandle>phandle to be executed on ffe4adc8, not on 0.

Referring to the IEEE-1275 spec, the arguments for seek are listed as:

IN: ihandle, pos.hi, pos.lo

whereas the comments in forth/system/ciface.fs say:

( ihandle pos_hi pos_lo -- status )

Has OpenBIOS misinterpreted the spec, in that arguments in the OF spec should read top of the stack to bottom of the stack from left to right, rather than the other way around? Then again, if this were the case, would other OpenBIOS-based client interfaces not have discovered this before?

ATB,

Mark.

Tarl Neustaedter

10:44 p.m.

New subject: Sparc64 OpenBIOS

Mark Cave-Ayland wrote:

...

Mark Cave-Ayland wrote:

...
Hmmm it looks to me as if the client interface seek word is expecting the arguments in a different order - I would expect ihandle>phandle to be executed on ffe4adc8, not on 0.

Referring to the IEEE-1275 spec, the arguments for seek are listed as:

IN: ihandle, pos.hi, pos.lo

whereas the comments in forth/system/ciface.fs say:

( ihandle pos_hi pos_lo -- status )

Has OpenBIOS misinterpreted the spec, in that arguments in the OF spec should read top of the stack to bottom of the stack from left to right, rather than the other way around? Then again, if this were the case, would other OpenBIOS-based client interfaces not have discovered this before?

That looks incorrect. The Client Interface part of the specification shows the arguments in backwards order (c order :-) compared to the rest of the specification. At the top of section 6.3.1, it states arguments are specified in the order arg1, ... argn. This is backwards from normal forth notation which is argn,... arg2, arg1. So it appears there is an error in forth/systems/cifaces.fs. The corresponding code in Sun's OBP (obp/os/bootprom/clientif.fth) shows:

cif: seek ( low,high ihandle -- status )

Nick Couchman

11:27 p.m.

New subject: Sparc64 OpenBIOS

...

...
...
On 2009/11/18 at 14:44, Tarl Neustaedter Tarl.Neustaedter@Sun.COM wrote:

Mark Cave-Ayland wrote:

...
Mark Cave-Ayland wrote:

...
Hmmm it looks to me as if the client interface seek word is expecting the arguments in a different order - I would expect ihandle>phandle to be executed on ffe4adc8, not on 0.

Referring to the IEEE-1275 spec, the arguments for seek are listed as:

IN: ihandle, pos.hi, pos.lo

whereas the comments in forth/system/ciface.fs say:

( ihandle pos_hi pos_lo -- status )

Has OpenBIOS misinterpreted the spec, in that arguments in the OF spec should read top of the stack to bottom of the stack from left to right, rather than the other way around? Then again, if this were the case, would other OpenBIOS-based client interfaces not have discovered this before?

That looks incorrect. The Client Interface part of the specification shows the arguments in backwards order (c order :-) compared to the rest of the specification. At the top of section 6.3.1, it states arguments are specified in the order arg1, ... argn. This is backwards from normal forth notation which is argn,... arg2, arg1. So it appears there is an error in forth/systems/cifaces.fs. The corresponding code in Sun's OBP (obp/os/bootprom/clientif.fth) shows:

cif: seek ( low,high ihandle -- status )

So, is this a change that needs to be made in the OpenBIOS forth code, in the Qemu code, or somewhere else??

-Nick

Tarl Neustaedter

11:40 p.m.

New subject: Sparc64 OpenBIOS

Nick Couchman wrote:

...

...
[....] cif: seek ( low,high ihandle -- status )

So, is this a change that needs to be made in the OpenBIOS forth code, in the Qemu code, or somewhere else??

It looks like the forth code is simply wrong. The first few tokens in forth/system/ciface.fs:seek are clearly expecting arguments in the wrong order ("swap rot dup ihandle>phandle" expects ihandle in the 3rd argument rather than 1st). The question is what in OpenBios depends on that misbehaviour, and there I am very much out of my depth.

Stefan Reinauer

19 Nov 19 Nov

9:24 a.m.

New subject: Sparc64 OpenBIOS

Tarl Neustaedter wrote:

...

It looks like the forth code is simply wrong. The first few tokens in forth/system/ciface.fs:seek are clearly expecting arguments in the wrong order ("swap rot dup ihandle>phandle" expects ihandle in the 3rd argument rather than 1st). The question is what in OpenBios depends on that misbehaviour, and there I am very much out of my depth.

Nothing _in_ OpenBIOS should be using client interface methods. If it did, we should fix that as we proceed.

With such a bug I wonder, however, how we could ever boot a client.

Attached patch should improve the behavior. It's untested though, possibly additional cleanup is needed after call-method...

reverse order of parameters for 6.3.2.3 Device I/O client interface functions.

Signed-off-by: Stefan Reinauer stepan@coresystems.de

Index: forth/system/ciface.fs =================================================================== --- forth/system/ciface.fs (revision 613) +++ forth/system/ciface.fs (working copy) @@ -212,21 +212,16 @@ close-dev ;

-: read ( ihandle addr len -- actual ) - rot dup ihandle>phandle " read" rot find-method - if swap call-package else 3drop -1 then +: read ( len addr ihandle -- actual ) + rot swap " read" call-method ;

-: write ( ihandle addr len -- actual ) - rot dup ihandle>phandle " write" rot find-method - if swap call-package else 3drop -1 then +: write ( len addr ihandle -- actual ) + rot swap " write" call-method ;

-: seek ( ihandle pos_hi pos_lo -- status ) - \ package methods uses ( pos_lo pos_hi -- status ) - swap - rot dup ihandle>phandle " seek" rot find-method - if swap call-package else 3drop -1 then +: seek ( pos_lo pos_hi ihandle -- status ) + " seek" call-method ;

@@ -261,7 +256,7 @@

: interpret ( xxx cmdstring -- ??? catch-reult ) dup cstrlen - \ ." INTERPRETE: --- " 2dup type + \ ." INTERPRET: --- " 2dup type ['] evaluate catch dup if \ this is not necessary an error... ." interpret: exception " dup . ." caught" cr

Mark Cave-Ayland

10:50 a.m.

New subject: Sparc64 OpenBIOS

Stefan Reinauer wrote:

...

Nothing _in_ OpenBIOS should be using client interface methods. If it did, we should fix that as we proceed.

With such a bug I wonder, however, how we could ever boot a client.

Yeah, that was one of my thoughts. Do we have a complete list of clients that have booted from OpenBIOS anywhere?

...

Attached patch should improve the behavior. It's untested though, possibly additional cleanup is needed after call-method...

Things get very slightly further, but it now throws an exception -21 (method not found):

Evaluating FCode... call-method : exception -21

seek failed

Can't mount root

byte-load: exception caught!

Here's the debug output with your patch applied:

: seek ( 8002000 800 8000 0 ffe4adc8 ) 00000000ffe28030: (") ( 8002000 800 8000 0 ffe4adc8 ffe28040 4 ) 00000000ffe28048: call-method : call-method ( 8002000 800 8000 0 ffe4adc8 ffe28040 4 ) 00000000ffe27db8: dup ( 8002000 800 8000 0 ffe4adc8 ffe28040 4 4 ) 00000000ffe27dc0: 0= ( 8002000 800 8000 0 ffe4adc8 ffe28040 4 0 ) 00000000ffe27dc8: do?branch ( 8002000 800 8000 0 ffe4adc8 ffe28040 4 ) 00000000ffe27e18: dup ( 8002000 800 8000 0 ffe4adc8 ffe28040 4 4 ) 00000000ffe27e20: >r ( 8002000 800 8000 0 ffe4adc8 ffe28040 4 ) 00000000ffe27e28: dup ( 8002000 800 8000 0 ffe4adc8 ffe28040 4 4 ) 00000000ffe27e30: cstrlen ( 8002000 800 8000 0 ffe4adc8 ffe28040 4 0 ) 00000000ffe27e38: rot ( 8002000 800 8000 0 ffe4adc8 4 0 ffe28040 ) 00000000ffe27e40: ?ihandle ( 8002000 800 8000 0 ffe4adc8 4 0 ffe28040 ) 00000000ffe27e48: (lit) ( 8002000 800 8000 0 ffe4adc8 4 0 ffe28040 ffe130d8 ) 00000000ffe27e58: catch ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf ) 00000000ffe27e60: dup ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf ffffffffffffffdf ) 00000000ffe27e68: do?branch ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf ) 00000000ffe27e78: (") ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf ffe27e88 c ) 00000000ffe27e98: type call-method ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf ) 00000000ffe27ea0: r@ ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf 4 ) 00000000ffe27ea8: dup ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf 4 4 ) 00000000ffe27eb0: cstrlen ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf 4 0 ) 00000000ffe27eb8: type ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf ) 00000000ffe27ec0: (") ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf ffe27ed0 c ) 00000000ffe27ee0: type : exception ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf ) 00000000ffe27ee8: dup ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf ffffffffffffffdf ) 00000000ffe27ef0: . -21 ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf ) 00000000ffe27ef8: cr ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf ) 00000000ffe27f00: r> ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf 4 ) 00000000ffe27f08: drop ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf ) 00000000ffe27f10: (semis) [ Finished call-method ] ( 8002000 800 8000 0 ffe4adc8 9 ffffffffffffffdf ffe06d78 ffffffffffffffdf ) 00000000ffe28050: (semis) [ Finished seek ] seek failed

Can't mount root Aborted. 0 >

On first glance, it looks as if cstrlen is returning 0 which is probably stopping the method name from being set correctly.

HTH,

Mark.

Tarl Neustaedter

11:31 a.m.

New subject: Sparc64 OpenBIOS

Mark Cave-Ayland wrote:

...

[...] On first glance, it looks as if cstrlen is returning 0 which is probably stopping the method name from being set correctly.

That code too is completely wrong. The arguments for $call-method are ( adr len ihandle ), and it appears to be passing ( ihandle adr len ).

It looks like $call-method is doing the right thing, at least - which means it chokes trying to use the "len" as an ihandle.

Nick Couchman

4:50 p.m.

New subject: Sparc64 OpenBIOS

So, does call-method in ciface.fs need to be modified to send the arguments to $call-method correctly? Or is it seek in ciface.fs that needs to be corrected?

-Nick

...

...
...
On 2009/11/19 at 03:31, Tarl Neustaedter Tarl.Neustaedter@Sun.COM wrote:

Mark Cave-Ayland wrote:

...
[...] On first glance, it looks as if cstrlen is returning 0 which is probably stopping the method name from being set correctly.

That code too is completely wrong. The arguments for $call-method are ( adr len ihandle ), and it appears to be passing ( ihandle adr len ).

It looks like $call-method is doing the right thing, at least - which means it chokes trying to use the "len" as an ihandle.

Tarl Neustaedter

5:13 p.m.

New subject: Sparc64 OpenBIOS

Nick Couchman wrote:

...

So, does call-method in ciface.fs need to be modified to send the arguments to $call-method correctly? Or is it seek in ciface.fs that needs to be corrected?

It looks like call-method is correct, it was the call from seek that was wrong.

Nick Couchman

5:27 p.m.

New subject: Sparc64 OpenBIOS

...

...
...
On 2009/11/19 at 09:13, Tarl Neustaedter Tarl.Neustaedter@Sun.COM wrote:

Nick Couchman wrote:

...
So, does call-method in ciface.fs need to be modified to send the arguments

to $call-method correctly? Or is it seek in ciface.fs that needs to be corrected?

...

It looks like call-method is correct, it was the call from seek that was wrong.

So, does it just need to be changed from: " seek" call-method

to: rot swap " seek" call-method

like read and write? It looks like the arguments to seek are a little different from read and write, and when I try changing it to that and rebuilding OpenBIOS, I get exactly the same exception -21.

(Sorry if that's a stupid mistake on my part - I'm very new to Forth, so I'm still trying to wrap my head around it :-).

-Nick

Tarl Neustaedter

5:40 p.m.

New subject: Sparc64 OpenBIOS

[...]

...

So, does it just need to be changed from: " seek" call-method

to: rot swap " seek" call-method

More like " seek" rot call-method

Nick Couchman

5:45 p.m.

New subject: Sparc64 OpenBIOS

...

...
...
On 2009/11/19 at 09:40, Tarl Neustaedter Tarl.Neustaedter@Sun.COM wrote:

[...]

...
So, does it just need to be changed from: " seek" call-method

to: rot swap " seek" call-method

More like " seek" rot call-method

Doh!

Well, no more Forth exceptions, but I think Qemu is not liking it, now:

0 > boot cdrom [sparc64] Booting file 'cdrom' with parameters '' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7120 bytes entry point is 0x4000 Evaluating FCode... Unhandled Exception 0x0000000000000034 PC = 0x00000000ffd10de4 NPC = 0x00000000ffd10de8 Stopping execution

-Nick

Blue Swirl

5:57 p.m.

New subject: Sparc64 OpenBIOS

On Thu, Nov 19, 2009 at 6:45 PM, Nick Couchman Nick.Couchman@seakr.com wrote:

...

...
...
...
On 2009/11/19 at 09:40, Tarl Neustaedter Tarl.Neustaedter@Sun.COM wrote:

[...]

...
So, does it just need to be changed from: " seek" call-method

to: rot swap " seek" call-method

More like " seek" rot call-method

Doh!

Well, no more Forth exceptions, but I think Qemu is not liking it, now:

0 > boot cdrom [sparc64] Booting file 'cdrom' with parameters '' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7120 bytes entry point is 0x4000 Evaluating FCode... Unhandled Exception 0x0000000000000034 PC = 0x00000000ffd10de4 NPC = 0x00000000ffd10de8 Stopping execution

This is unaligned access exception. With GDB you could check if some address matches the PC value.

Nick Couchman

8:07 p.m.

New subject: Sparc64 OpenBIOS

...

This is unaligned access exception. With GDB you could check if some address matches the PC value.

(gdb) l *0x00000000ffd10de4 0xffd10de4 is in fetch (../include/openbios/stack.h:34). 29 typedef ucell phandle_t; 30 31 32 33 34 static inline void PUSH(ucell value) { 35 dstack[++dstackcnt] = (value); 36 } 37 static inline void PUSH_xt( xt_t xt ) { PUSH( (ucell)xt ); } 38 static inline void PUSH_ih( ihandle_t ih ) { PUSH( (ucell)ih ); }

So, something about the PUSH function that it doesn't like??

-Nick

Blue Swirl

8:24 p.m.

New subject: Sparc64 OpenBIOS

On Thu, Nov 19, 2009 at 9:07 PM, Nick Couchman Nick.Couchman@seakr.com wrote:

...

...
This is unaligned access exception. With GDB you could check if some address matches the PC value.

(gdb) l *0x00000000ffd10de4 0xffd10de4 is in fetch (../include/openbios/stack.h:34). 29 typedef ucell phandle_t; 30 31 32 33 34 static inline void PUSH(ucell value) { 35 dstack[++dstackcnt] = (value); 36 } 37 static inline void PUSH_xt( xt_t xt ) { PUSH( (ucell)xt ); } 38 static inline void PUSH_ih( ihandle_t ih ) { PUSH( (ucell)ih ); }

So, something about the PUSH function that it doesn't like??

More likely the address given to fetch was not aligned: static void fetch(void) { const ucell *aaddr = (ucell *)cell2pointer(POP()); PUSH(read_ucell(aaddr)); }

Here QEMU can help, enable DEBUG_PCALL in target-sparc/op_helper.c and recompile. Then run QEMU with -d int and /tmp/qemu.log will contain the register dump at the time of the exception.

Nick Couchman

8:30 p.m.

New subject: Sparc64 OpenBIOS

...

More likely the address given to fetch was not aligned: static void fetch(void) { const ucell *aaddr = (ucell *)cell2pointer(POP()); PUSH(read_ucell(aaddr)); }

Here QEMU can help, enable DEBUG_PCALL in target-sparc/op_helper.c and recompile. Then run QEMU with -d int and /tmp/qemu.log will contain the register dump at the time of the exception.

I enabled DEBUG_PCALL as well as DEBUG_UNALIGNED in target-sparc/op_helper.c. Output for qemu was:

0 > boot cdrom [sparc64] Booting file 'cdrom' with parameters '' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7420 bytes entry point is 0x4000 Evaluating FCode... Unaligned access to 0x0000000000000014 from 0x00000000ffd10d9c Unhandled Exception 0x0000000000000034 PC = 0x00000000ffd10de4 NPC = 0x00000000ffd10de8 Stopping execution

and /tmp/qemu.log contains this at the end:

Search PC... 4550: Unaligned Memory Access (v=0034) pc=00000000ffd10de4 npc=00000000ffd10de8 SP=00000000fff10cd1 pc: 00000000ffd10de4 npc: 00000000ffd10de8 General Registers: %g0: 0000000000000000 %g1: 00000000000000b8 %g2: 0000000000000014 %g3: 00000000ffee3d50 %g4: 00000000ffee3000 %g5: 0000000000000041 %g6: 0000000000000000 %g7: 0000000000000000 Current Register Window: %o0: 000000000000003f %o1: 00000000ffe130f0 %o2: 0000000000000018 %o3: 00000000ffee3000 %o4: 00000000ffee3c00 %o5: 0000000000000210 %o6: 00000000fff10cd1 %o7: 00000000ffd0f068 %l0: 00000000ffee3000 %l1: 0000000000000000 %l2: 00000000ffee3000 %l3: 0000000000000000 %l4: 000000000000001a %l5: 0000000000000000 %l6: 0000000000000000 %l7: 0000000000000000 %i0: 00000000ffe1ac30 %i1: 0000000000000200 %i2: 00000000ffe00000 %i3: 0000000000000000 %i4: ffffffffffffffff %i5: 0000000000000018 %i6: 00000000fff10d91 %i7: 00000000ffd126f8

Floating Point Registers: %f00: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f04: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f08: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f12: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f16: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f20: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f24: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f28: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 pstate: 0x00000016 ccr: 0x44 asi: 0x00 tl: 0 fprs: 0 cansave: 6 canrestore: 0 otherwin: 0 wstate 0 cleanwin 7 cwp 4 fsr: 0x00000000

-Nick

Stefan Reinauer

8:35 p.m.

New subject: Sparc64 OpenBIOS

On 11/19/09 8:30 PM, Nick Couchman wrote:

...

...
More likely the address given to fetch was not aligned: static void fetch(void) { const ucell *aaddr = (ucell *)cell2pointer(POP()); PUSH(read_ucell(aaddr)); }

Here QEMU can help, enable DEBUG_PCALL in target-sparc/op_helper.c and recompile. Then run QEMU with -d int and /tmp/qemu.log will contain the register dump at the time of the exception.

I enabled DEBUG_PCALL as well as DEBUG_UNALIGNED in target-sparc/op_helper.c. Output for qemu was:

0 > boot cdrom [sparc64] Booting file 'cdrom' with parameters '' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7420 bytes entry point is 0x4000 Evaluating FCode... Unaligned access to 0x0000000000000014 from 0x00000000ffd10d9c Unhandled Exception 0x0000000000000034

This still very much looks like it's using the size of a string as an address somewhere :-(

Stefan

Nick Couchman

9:07 p.m.

New subject: Sparc64 OpenBIOS

...

...
0 > boot cdrom [sparc64] Booting file 'cdrom' with parameters '' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7420 bytes entry point is 0x4000 Evaluating FCode... Unaligned access to 0x0000000000000014 from 0x00000000ffd10d9c Unhandled Exception 0x0000000000000034

This still very much looks like it's using the size of a string as an address somewhere :-(

Stefan

So, in trying to track this down further, I need some more help with the debugger in OpenBIOS. It seems that I have to go through the boot process at least once before I can debug certain things. For example, if I start up qemu and immediately type "debug do-boot", I'm told "could not locate word for debugging ok", and, when I try to boot, it doesn't allow me to step through do-boot - it just continues on. If I let the boot fail once, then try "debug do-boot" it allows me to step through it. Unfortunately, with this most recent error - the Unhandled Exception - the first time I boot is also the last time until I restart Qemu, making it very difficult to make it stop at the correct place in order to debug. So, is there a way that I can load the words before I actually attempt the boot so that OpenBIOS knows where to stop and so that I can track down where this unhandled exception is occurring?

-Nick

Tarl Neustaedter

9:14 p.m.

New subject: Sparc64 OpenBIOS

Nick Couchman wrote:

...

[...] So, in trying to track this down further, I need some more help with the debugger in OpenBIOS. It seems that I have to go through the boot process at least once before I can debug certain things. For example, if I start up qemu and immediately type "debug do-boot", I'm told "could not locate word for debugging ok",

Right. do-boot is defined by the primary bootblocks, the code it reads from blocks 1-15. If you're still booting from the Nevada CD, use boot cdrom -H , which will cause it to halt before it executes do-boot. You'll have to clean up -H from the arguments so you can get past the test for halt?.

I'd suggest:

ok boot cdrom -H ok debug do-boot ok do-boot

(step through until you see the test for halt?, then enter forth and set halt? to false).

Nick Couchman

9:18 p.m.

New subject: Sparc64 OpenBIOS

...

...
...
On 2009/11/19 at 13:14, Tarl Neustaedter Tarl.Neustaedter@Sun.COM wrote:

Nick Couchman wrote:

...
[...] So, in trying to track this down further, I need some more help with the

debugger in OpenBIOS. It seems that I have to go through the boot process at least once before I can debug certain things. For example, if I start up qemu and immediately type "debug do-boot", I'm told "could not locate word for debugging ok",

Right. do-boot is defined by the primary bootblocks, the code it reads from blocks 1-15. If you're still booting from the Nevada CD, use boot cdrom -H , which will cause it to halt before it executes do-boot. You'll have to clean up -H from the arguments so you can get past the test for halt?.

I'd suggest:

ok boot cdrom -H ok debug do-boot ok do-boot

(step through until you see the test for halt?, then enter forth and set halt? to false).

Unfortunately the unhandled exception occurs before this check is done:

0 > boot cdrom -H [sparc64] Booting file 'cdrom' with parameters '-H' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7420 bytes entry point is 0x4000 Evaluating FCode... Unaligned access to 0x0000000000000014 from 0x00000000ffd10d9c Unhandled Exception 0x0000000000000034 PC = 0x00000000ffd10de4 NPC = 0x00000000ffd10de8 Stopping execution

And, yes, still booting from sol-nv-b127-sparc-dvd.iso file...

-Nick

Tarl Neustaedter

9:29 p.m.

New subject: Sparc64 OpenBIOS

Nick Couchman wrote:

...

[...] Unfortunately the unhandled exception occurs before this check is done:

Then we've broken something else, because we were getting past that before. We aren't getting to do-boot now.

What boot does is essentially:

<boot-device> open-dev 4000 swap " load" rot $call-method

(it's a bit more complicated, but that's the essence). We seem to be dying in either open-dev or "load" now.

Nick Couchman

9:41 p.m.

New subject: Sparc64 OpenBIOS

...

...
...
On 2009/11/19 at 13:29, Tarl Neustaedter Tarl.Neustaedter@Sun.COM wrote:

Nick Couchman wrote:

...
[...] Unfortunately the unhandled exception occurs before this check is done:

Then we've broken something else, because we were getting past that before. We aren't getting to do-boot now.

What boot does is essentially:

<boot-device> open-dev 4000 swap " load" rot $call-method

(it's a bit more complicated, but that's the essence). We seem to be dying in either open-dev or "load" now.

So, using gdb, I've managed to come up with a bit more information - whether it's useful or not is beyond my ability to determine, but here goes...

I set up a breakpoint at fcode_load, because that's the C-based function where it appears to be failing. Once we're in that function, I step through:

Breakpoint 1, fcode_load (filename=0xffec3658 "cdrom") at ../arch/sparc64/fcodeload.c:21 21 if (!file_open(filename)) (gdb) next 25 file_seek(offset); (gdb) 32 switch (fcode_header[0]) { (gdb) 25 file_seek(offset); (gdb) 26 if (lfile_read(&fcode_header, sizeof(fcode_header)) (gdb) 32 switch (fcode_header[0]) { (gdb) 24 for (offset = 0; offset < 16 * 512; offset += 512) { (gdb) 32 switch (fcode_header[0]) { (gdb) 25 file_seek(offset); (gdb) 26 if (lfile_read(&fcode_header, sizeof(fcode_header)) (gdb) 32 switch (fcode_header[0]) { (gdb) 52 printf("Loading FCode image...\n"); (gdb) 50 start = 0x4000; (gdb) 47 size = (fcode_header[4] << 24) | (fcode_header[5] << 16) | (gdb) 52 printf("Loading FCode image...\n"); (gdb) 47 size = (fcode_header[4] << 24) | (fcode_header[5] << 16) | (gdb) 52 printf("Loading FCode image...\n"); (gdb) 47 size = (fcode_header[4] << 24) | (fcode_header[5] << 16) | (gdb) 52 printf("Loading FCode image...\n"); (gdb) 47 size = (fcode_header[4] << 24) | (fcode_header[5] << 16) | (gdb) 52 printf("Loading FCode image...\n"); (gdb) 54 file_seek(offset + sizeof(fcode_header)); (gdb) 56 if ((unsigned long)lfile_read((void *)start, size) != size) { (gdb) 61 debug("Loaded %lu bytes\n", size); (gdb) 70 retval = 0; (gdb) 61 debug("Loaded %lu bytes\n", size); (gdb) 63 debug("entry point is %#lx\n", start); (gdb) 64 printf("Evaluating FCode...\n"); (gdb) 35 dstack[++dstackcnt] = (value); (gdb) 68 fword("byte-load"); (gdb) 35 dstack[++dstackcnt] = (value); (gdb) 68 fword("byte-load"); (gdb) 35 dstack[++dstackcnt] = (value); (gdb) 68 fword("byte-load"); (gdb) 35 dstack[++dstackcnt] = (value); (gdb) 68 fword("byte-load"); (gdb) 35 dstack[++dstackcnt] = (value); (gdb) 68 fword("byte-load"); (gdb) 35 dstack[++dstackcnt] = (value); (gdb) 68 fword("byte-load"); (gdb) 35 dstack[++dstackcnt] = (value); (gdb) 68 fword("byte-load"); (gdb) 35 dstack[++dstackcnt] = (value); (gdb) 68 fword("byte-load"); (gdb)

That last entry - fword("byte-load"); - is where I get the Unhandled Exception in Qemu. So, I also set fcode-verbose to true and ran it, again - the output is very, very long, so I won't include it in the e-mail, but the last few lines look like this:

5b82 : b(:) [ 0xb7 ] 5b84 : (compile) parse-bootargs [ 0x8cf ] 5b86 : (compile) halt? [ 0x8ce ] 5b87 : (compile) b?branch [ 0x14 ] (offset) 1d 5b8a : (compile) b(") [ 0x12 ] (const) Halted with -H flag. 5ba1 : (compile) type [ 0x90 ] 5ba2 : (compile) cr [ 0x92 ] 5ba3 : (compile) exit [ 0x33 ] 5ba4 : (compile) b(>resolve) [ 0xb2 ] 5ba6 : (compile) get-bootdev [ 0x8a0 ] 5ba8 : (compile) load-pkg [ 0x89f ] 5baa : (compile) mount-root [ 0x8a1 ] 5bac : (compile) zflag? [ 0x8c4 ] 5bae : (compile) nested? [ 0x892 ] 5baf : (compile) invert [ 0x26 ] 5bb0 : (compile) and [ 0x23 ] 5bb1 : (compile) b?branch [ 0x14 ] (offset) 7 5bb5 : (compile) fs-name$ [ 0x8c6 ] 5bb7 : (compile) open-zfs-fs [ 0x8c7 ] 5bb8 : (compile) b(>resolve) [ 0xb2 ] 5bba : (compile) load-file [ 0x8e4 ] 5bbc : (compile) setup-props [ 0x8e5 ] 5bbe : (compile) exec-file [ 0x8e6 ] 5bbf : (compile) b(;) [ 0xc2 ] 5bc0 : 0 [ 0xa5 ] 5bc1 : b(to) [ 0xc3 ] 5bc5 : do-boot [ 0x8e7 ] Unaligned access to 0x0000000000000014 from 0x00000000ffd10d9c Unhandled Exception 0x0000000000000034

Let me know if you want me to attach the full output, or if this is helpful at all...

-Nick

Tarl Neustaedter

9:54 p.m.

New subject: Sparc64 OpenBIOS

Nick Couchman wrote:

...

[...] That last entry - fword("byte-load"); - is where I get the Unhandled Exception in Qemu. So, I also set fcode-verbose to true and ran it, again - the output is very, very long, so I won't include it in the e-mail, but the last few lines look like this:

5b82 : b(:) [ 0xb7 ] 5b84 : (compile) parse-bootargs [ 0x8cf ] 5b86 : (compile) halt? [ 0x8ce ] 5b87 : (compile) b?branch [ 0x14 ] (offset) 1d 5b8a : (compile) b(") [ 0x12 ] (const) Halted with -H flag. 5ba1 : (compile) type [ 0x90 ] 5ba2 : (compile) cr [ 0x92 ] 5ba3 : (compile) exit [ 0x33 ] 5ba4 : (compile) b(>resolve) [ 0xb2 ] 5ba6 : (compile) get-bootdev [ 0x8a0 ] 5ba8 : (compile) load-pkg [ 0x89f ] 5baa : (compile) mount-root [ 0x8a1 ] 5bac : (compile) zflag? [ 0x8c4 ] 5bae : (compile) nested? [ 0x892 ] 5baf : (compile) invert [ 0x26 ] 5bb0 : (compile) and [ 0x23 ] 5bb1 : (compile) b?branch [ 0x14 ] (offset) 7 5bb5 : (compile) fs-name$ [ 0x8c6 ] 5bb7 : (compile) open-zfs-fs [ 0x8c7 ] 5bb8 : (compile) b(>resolve) [ 0xb2 ] 5bba : (compile) load-file [ 0x8e4 ] 5bbc : (compile) setup-props [ 0x8e5 ] 5bbe : (compile) exec-file [ 0x8e6 ] 5bbf : (compile) b(;) [ 0xc2 ]

Yup. Above is compiling all the FCode from the primary bootblocks. That's good - it managed to read the initial blocks and get through evaluating them. (I fear it's going to mush itself together in the email, however. Ah well).

...

5bc0 : 0 [ 0xa5 ] 5bc1 : b(to) [ 0xc3 ] 5bc5 : do-boot [ 0x8e7 ] Unaligned access to 0x0000000000000014 from 0x00000000ffd10d9c Unhandled Exception 0x0000000000000034

Let me know if you want me to attach the full output, or if this is helpful at all...

O.k. - the above says it *did* manage to execute the do-boot. Since you've given it the -H flag, it should have stopped about four tokens in. The only thing it could be executing was "parse-bootargs", and that hadn't failed on us before.

Are you sure you're giving the command "boot cdrom -H" (capital H)?

Nick Couchman

10:07 p.m.

New subject: Sparc64 OpenBIOS

...

...
5bc0 : 0 [ 0xa5 ] 5bc1 : b(to) [ 0xc3 ] 5bc5 : do-boot [ 0x8e7 ] Unaligned access to 0x0000000000000014 from 0x00000000ffd10d9c Unhandled Exception 0x0000000000000034

Let me know if you want me to attach the full output, or if this is helpful

at all...

...
O.k. - the above says it *did* manage to execute the do-boot. Since you've given it the -H flag, it should have stopped about four tokens in. The only thing it could be executing was "parse-bootargs", and that hadn't failed on us before.

Are you sure you're giving the command "boot cdrom -H" (capital H)?

Yep, verified that I'm executing "boot cdrom -H" with where "-H" is minus-H (capital H). I get this output at boot:

0 > true to ?fcode-verbose ok 0 > boot cdrom -H [sparc64] Booting file 'cdrom' with parameters '-H' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7120 bytes entry point is 0x4000 Evaluating FCode...

byte-load: evaluating fcode at 0x4000 fcode-table at 0xffe4a818 4000 : offset16 [ 0xcc ] 4001 : named-token [ 0xb6 ] (const) fs-pkg$ (fcode#) 800 ...

-Nick

Tarl Neustaedter

9:59 p.m.

New subject: Sparc64 OpenBIOS

Whuups. Something else I just noticed:

...

5bbf : (compile) b(;) [ 0xc2 ] 5bc0 : 0 [ 0xa5 ] 5bc1 : b(to) [ 0xc3 ] 5bc5 : do-boot [ 0x8e7 ]

We seem to be missing a word. After the compile is done, we should have "0 to my-self" followed by "do-boot". I don't see the my-self there. If it's just a peculiarity of the debugger to not show the destination, no problem - but if he tries to do "0 to do-boot", that will barf.

Nick Couchman

10:10 p.m.

New subject: Sparc64 OpenBIOS

...

...
...
On 2009/11/19 at 13:59, Tarl Neustaedter Tarl.Neustaedter@Sun.COM wrote:

Whuups. Something else I just noticed:

...
5bbf : (compile) b(;) [ 0xc2 ] 5bc0 : 0 [ 0xa5 ] 5bc1 : b(to) [ 0xc3 ] 5bc5 : do-boot [ 0x8e7 ]

We seem to be missing a word. After the compile is done, we should have "0 to my-self" followed by "do-boot". I don't see the my-self there. If it's just a peculiarity of the debugger to not show the destination, no problem - but if he tries to do "0 to do-boot", that will barf.

Not sure...here are some other bits of output with "(to)":

(offset) e 584b : (compile) fs-name [ 0x8c5 ] 584c : (compile) swap [ 0x49 ] 584d : (compile) move [ 0x78 ] 584e : (compile) -1 [ 0xa4 ] 584f : (compile) b(to) [ 0xc3 ] 5852 : (compile) bbranch [ 0x13 ] (offset) 5

5a8a : b(:) [ 0xb7 ] 5a8b : (compile) dup [ 0x47 ] 5a8c : (compile) b(to) [ 0xc3 ] 5a90 : (compile) [ 0x8d7 ] 5a91 : (compile) b(lit) [ 0x10 ] 5a96 : (compile) <> [ 0x3d ] 5a97 : (compile) b?branch [ 0x14 ]

5b0d : (compile) die [ 0x809 ] 5b0e : (compile) b(>resolve) [ 0xb2 ] 5b0f : (compile) b(to) [ 0xc3 ] 5b12 : (compile) swap [ 0x49 ] 5b13 : (compile) b(to) [ 0xc3 ] 5b16 : (compile) swap [ 0x49 ] 5b18 : (compile) [ 0x898 ] 5b19 : (compile) b(;) [ 0xc2 ]

But I don't know if those are similar cases or not - a couple of them seem to involve memory addresses...

-Nick

Tarl Neustaedter

10:30 p.m.

New subject: Sparc64 OpenBIOS

...

Not sure...here are some other bits of output with "(to)":

O.k. - it appears this debugger simply doesn't show the destination. So we're good there - it was a red herring.

If we're still getting to "seek" and blowing up, that says that for whatever reason the -H isn't getting parsed correctly. Not sure what to do about that. If you get desperate, you might try to patch out the "do-boot" call at the end of the FCode on the ISO image, so it does the load but doesn't execute the do-boot.

Don't know what else to suggest.

Nick Couchman

10:41 p.m.

New subject: Sparc64 OpenBIOS

...

...
...
On 2009/11/19 at 14:30, Tarl Neustaedter Tarl.Neustaedter@Sun.COM wrote:

Not sure...here are some other bits of output with "(to)":

O.k. - it appears this debugger simply doesn't show the destination. So we're good there - it was a red herring.

If we're still getting to "seek" and blowing up, that says that for whatever reason the -H isn't getting parsed correctly. Not sure what to do about that. If you get desperate, you might try to patch out the "do-boot" call at the end of the FCode on the ISO image, so it does the load but doesn't execute the do-boot.

Don't know what else to suggest.

Not getting desperate, just think it would be very cool to get qemu-system-sparc64 to boot Solaris correctly, and willing to do some work to try to make that happen. I'll keep poking around and get more familiar with the debuggers, and wait and see if any of the other folks on the list have any suggestions. Thanks for all your help, Tarl!

Just for kicks I tried to use "-h" (lower-case "h"), instead, and that doesn't parse any better than -H.

-Nick

Mark Cave-Ayland

10:55 p.m.

New subject: Sparc64 OpenBIOS

Stefan Reinauer wrote:

...

This still very much looks like it's using the size of a string as an address somewhere :-(

Stefan

Yeah, that was my take on it too. I spent a bit of time taking your patch and playing with call-method, and managed to get a bit further; at least stepping through read in the debugger showed something that looked like a Milax CDROM sector. With the attached patch applied to current SVN, I now get the following:

OpenBIOS for Sparc64 Configuration device id QEMU version 1 machine id 0 CPUs: 1 x SUNW,UltraSPARC-II UUID: 00000000-0000-0000-0000-000000000000 Welcome to OpenBIOS v1.0 built on Nov 19 2009 21:42 Type 'help' for detailed information

[sparc64] Booting file 'cdrom' with parameters '' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7084 bytes entry point is 0x4000 Evaluating FCode... reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word.

Can't open boot_archive

byte-load: exception caught!

0 >

ATB,

Mark.

Stefan Reinauer

11:39 p.m.

New subject: Sparc64 OpenBIOS

Mark Cave-Ayland wrote:

...

Stefan Reinauer wrote:

...
This still very much looks like it's using the size of a string as an address somewhere :-(

Stefan

Yeah, that was my take on it too. I spent a bit of time taking your patch and playing with call-method, and managed to get a bit further; at least stepping through read in the debugger showed something that looked like a Milax CDROM sector. With the attached patch applied to current SVN, I now get the following:

OpenBIOS for Sparc64 Configuration device id QEMU version 1 machine id 0 CPUs: 1 x SUNW,UltraSPARC-II UUID: 00000000-0000-0000-0000-000000000000 Welcome to OpenBIOS v1.0 built on Nov 19 2009 21:42 Type 'help' for detailed information

[sparc64] Booting file 'cdrom' with parameters '' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7084 bytes entry point is 0x4000 Evaluating FCode... reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word. reserved fcode word.

[..]

Awesome! Can you find out the fcode numbers of those words? Sounds like the Solaris bootloader is expecting some Fcode extensions that openbios does not (yet) implement (such as 64bit extensions?)

Stefan

Mark Cave-Ayland

20 Nov 20 Nov

10:09 a.m.

New subject: Sparc64 OpenBIOS

Stefan Reinauer wrote:

...

Awesome! Can you find out the fcode numbers of those words? Sounds like the Solaris bootloader is expecting some Fcode extensions that openbios does not (yet) implement (such as 64bit extensions?)

Stefan

No worries. The offending FCodes are:

0x246 - x@ ( oaddr -- o ) Fetch octlet from an octlet aligned address

0x247 - x! ( o oaddr -- ) Store octlet to an octlet aligned address

Do you reckon you could come up with an implementation of these relatively easily?

ATB,

Mark.

Tarl Neustaedter

11:18 a.m.

New subject: Sparc64 OpenBIOS

Mark Cave-Ayland wrote:

...

[...] No worries. The offending FCodes are:

0x246 - x@ ( oaddr -- o ) Fetch octlet from an octlet aligned address

0x247 - x! ( o oaddr -- ) Store octlet to an octlet aligned address

Oops. Yup, during the Solaris 10->Solaris Nevada rewrite, they added a bunch of x@ and x! throughout the code.

Stefan Reinauer

4:51 p.m.

New subject: Sparc64 OpenBIOS

On 11/20/09 10:09 AM, Mark Cave-Ayland wrote:

...

No worries. The offending FCodes are:

0x246 - x@ ( oaddr -- o ) Fetch octlet from an octlet aligned address

0x247 - x! ( o oaddr -- ) Store octlet to an octlet aligned address

Do you reckon you could come up with an implementation of these relatively easily?

Please update to r617... I added a (pretty much untested, but it does not break things that were not broken before) implementation of the IEEE Draft Std P1275.6/D5 64 Bit Extensions and tried adding some (probably rarely used) words that were unimplemented in forth/device/others.fs

Please give it a shot to see if it changes things

Stefan

-- coresystems GmbH • Brahmsstr. 16 • D-79104 Freiburg i. Br. Tel.: +49 761 7668825 • Fax: +49 761 7664613 Email: info@coresystems.de • http://www.coresystems.de/ Registergericht: Amtsgericht Freiburg • HRB 7656 Geschäftsführer: Stefan Reinauer • Ust-IdNr.: DE245674866

Mark Cave-Ayland

5:02 p.m.

New subject: Sparc64 OpenBIOS

Stefan Reinauer wrote:

...

...
Do you reckon you could come up with an implementation of these relatively easily?

Please update to r617... I added a (pretty much untested, but it does not break things that were not broken before) implementation of the IEEE Draft Std P1275.6/D5 64 Bit Extensions and tried adding some (probably rarely used) words that were unimplemented in forth/device/others.fs

Ooooh - nice work! You're obviously more fluent in Forth than most people here ;)

...

Please give it a shot to see if it changes things

Yes it does, but with my patch from yesterday also applied, now it looks as if something else is broken:

[sparc64] Booting file 'cdrom' with parameters '' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7084 bytes entry point is 0x4000 Evaluating FCode... Unhandled Exception 0x00000004ff702000 PC = 0x00000000ffd102b8 NPC = 0x00000000ffd102bc Stopping execution

BTW is there any chance you could review and apply the patch for debugger here: http://lists.openbios.org/pipermail/openbios/2009-November/004063.html.

Many thanks,

Mark.

Stefan Reinauer

5:11 p.m.

New subject: Sparc64 OpenBIOS

On 11/20/09 5:02 PM, Mark Cave-Ayland wrote: Please give it a shot to see if it changes things

...

Yes it does, but with my patch from yesterday also applied, now it looks as if something else is broken:

OpenBIOS for Sparc64 Configuration device id QEMU version 1 machine id 0 CPUs: 1 x SUNW,UltraSPARC-II UUID: 00000000-0000-0000-0000-000000000000 Welcome to OpenBIOS v1.0 built on Nov 20 2009 15:53 Type 'help' for detailed information

[sparc64] Booting file 'cdrom' with parameters '' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7084 bytes entry point is 0x4000 Evaluating FCode... Unhandled Exception 0x00000004ff702000

Sorry to say, it seems I badly implemented the most important word unaligned-x!

Going to check in a fix in a few minutes (this time I better test it)

...

BTW is there any chance you could review and apply the patch for debugger here: http://lists.openbios.org/pipermail/openbios/2009-November/004063.html.

I'll have a look...

Stefan

Tarl Neustaedter

5:25 p.m.

New subject: Sparc64 OpenBIOS

Stefan Reinauer wrote:

...

[...] Sorry to say, it seems I badly implemented the most important word unaligned-x!

I was wondering about that.... Wouldn't it have been easier to do something like:

...

r xlsplit r@ l! r> la+ l!

(Of course, I haven't tested that, either)

Tarl Neustaedter

5:33 p.m.

New subject: Sparc64 OpenBIOS

[...]

...

...
r xlsplit r@ l! r> la+ l!

And if I'd been paying attention, that "la+" would have been "/l +". Sigh. And it's not endian-correct, either.

Nick Couchman

8:33 p.m.

New subject: Sparc64 OpenBIOS

...

Yes it does, but with my patch from yesterday also applied, now it looks as if something else is broken:

OpenBIOS for Sparc64 Configuration device id QEMU version 1 machine id 0 CPUs: 1 x SUNW,UltraSPARC-II UUID: 00000000-0000-0000-0000-000000000000 Welcome to OpenBIOS v1.0 built on Nov 20 2009 15:53 Type 'help' for detailed information

[sparc64] Booting file 'cdrom' with parameters '' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7084 bytes entry point is 0x4000 Evaluating FCode... Unhandled Exception 0x00000004ff702000 PC = 0x00000000ffd102b8 NPC = 0x00000000ffd102bc Stopping execution

After updating to r624, I get the following:

0 > boot cdrom [sparc64] Booting file 'cdrom' with parameters '' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7120 bytes entry point is 0x4000 Evaluating FCode... Unhandled Exception 0x9000280200000000 PC = 0x00000000ffd0f05c NPC = 0x00000000ffd0f060 Stopping execution

When I use gdb to trace down the source of the exception, it looks like this:

(gdb) l *0x00000000ffd0f05c 0xffd0f05c is in enterforth (../kernel/internal.c:71). 66 #define dbg_internal_printk( a... ) printk( a ) 67 #endif 68 69 70 static inline void processxt(ucell xt) 71 { 72 void (*tokenp) (void); 73 74 dbg_interp_printk("processxt: pc=%x, xt=%x\n", PC, xt); 75 tokenp = words[xt];

-Nick

5645

days inactive

5647

days old

openbios@openbios.org

45 comments

7 participants

tags (0)

participants (7)

Blue Swirl
Mark Cave-Ayland
Nick Couchman
Stefan Reinauer
Stefan Reinauer
Tarl Neustaedter
Tarl Neustaedter