Hi all,
With the previous commit, I seem to be getting to the point where files are now being read from the ISO into memory. However, on Milax things are still failing because the Fcode tries to set up a callback function repeated every second using the "alarm" word.
Does SPARC64 have a way of counting elapsed time (some kind of clock tick register) so that it would be possible to measure this in the main loop? Or alternatively set up some kind of timer interrupt after X ms?
Incidentally I notice that the latest Solaris code doesn't seem to use alarm anymore; Nick - does your Nevada boot get any further with current OpenBIOS SVN?
ATB,
Mark.
Mark Cave-Ayland wrote:
Hi all,
With the previous commit, I seem to be getting to the point where files are now being read from the ISO into memory. However, on Milax things are still failing because the Fcode tries to set up a callback function repeated every second using the "alarm" word.
I believe either the console or serial driver will do that, and possibly the secondary bootstrapper to cycle the -/| twirly baton.
Does SPARC64 have a way of counting elapsed time (some kind of clock tick register) so that it would be possible to measure this in the main loop?
Yup. It's usually available through "stick@" - I believe it counts nanoseconds.
Sorry I wasn't able to review your stuff, which looked reasonably correct, but I'm on vacation at the moment.
Tarl Neustaedter wrote:
I believe either the console or serial driver will do that, and possibly the secondary bootstrapper to cycle the -/| twirly baton.
Yeah. Checking against the online OpenSolaris source, it looks as if it's trying to do the twirly baton. Although interestingly the use of alarm seems to have been removed from the latest code browsable online.
Does SPARC64 have a way of counting elapsed time (some kind of clock tick register) so that it would be possible to measure this in the main loop?
Yup. It's usually available through "stick@" - I believe it counts nanoseconds.
Okay. I've had a quick grep through the Qemu sources and I can see that there is some mention of a tick command - I'm not quite sure what guarantees there are on timing/resolution in the Qemu implementation though. Blue, can you offer any advice here?
Sorry I wasn't able to review your stuff, which looked reasonably correct, but I'm on vacation at the moment.
I was fairly sure it was okay (as forthstrap is used to build itself) but wasn't too sure at the time that I had missed anything. However, I've performed tests using all of the control flow words with the debugger and everything seems to act as I would expect. Plus the Fcode evaluator now seems to resolve backwards branches correctly.
TBH I'm really grateful for the advice; it's nice to have someone around who knows what the behaviour *should* be as I only have physical access to 1 Solaris box, and it's being used to host various services and so I can't keep rebooting it all the time!
ATB,
Mark.
Mark Cave-Ayland wrote:
I was fairly sure it was okay (as forthstrap is used to build itself) but wasn't too sure at the time that I had missed anything. However, I've performed tests using all of the control flow words with the debugger and everything seems to act as I would expect. Plus the Fcode evaluator now seems to resolve backwards branches correctly.
Can you check those tests in, too? It might be time we start gathering a test suite beyound the hayes tests...
Stefan
Stefan Reinauer wrote:
Can you check those tests in, too? It might be time we start gathering a test suite beyound the hayes tests...
I'm not sure that there's much point checking in tests for control words, as forthstrap is fairly good at segfaulting before you can build anything to test if you get it wrong ;)
Things I would find quite useful in the past would be tests of all CIF words from both C and Forth, plus an extensive workout of the Fcode evaluator. However, at this point in the game I think I'm just about on top of how this all works, so I'm not sure it's currently the best use of my time... :/
ATB,
Mark.
Incidentally I notice that the latest Solaris code doesn't seem to use alarm anymore; Nick - does your Nevada boot get any further with current OpenBIOS SVN?
Nevada doesn't get any further, and I get the same error trying to boot the Milax 0.3.2 SPARC ISO. Trying to dig up some debugging information right now, but I'm running into any issue with none of the words being available for debugging until after the boot command has been issues, which of course then causes the unhandled exception...
-Nick
-------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
Nick Couchman wrote:
Incidentally I notice that the latest Solaris code doesn't seem to use alarm anymore; Nick - does your Nevada boot get any further with current OpenBIOS SVN?
Nevada doesn't get any further, and I get the same error trying to boot the Milax 0.3.2 SPARC ISO. Trying to dig up some debugging information right now, but I'm running into any issue with none of the words being available for debugging until after the boot command has been issues, which of course then causes the unhandled exception...
Right. If you're doing Nevada, the booter should respond to a "-H" (capital H) switch to abort the boot after defining the words. The boot command does essentially:
open boot device read blocks 1-15 byte-load the boot blocks just read in
The boot blocks themselves create a bunch of forth code, mostly in /packages/ufs-file-system (or hsfs-file-system), and at the very end, invokes the method "do-boot" that it just defined. This method checks the boot arguments for "-H", and aborts. If you can get that to happen, you can then set breakpoints, clear the boot arguments, clear the "halt?" flag, and re-invoke "do-boot".
Lessee... An email I sent internally on debugging the boot blocks is below. The details of the problems in the UFS code are probably uninteresting, but the techniques may be helpful:
---------------------------------------------------------------------
For what it's worth - some debugging technique I figure is worth writing down, particularly for our more recent engineers.
I spent the last week debugging the bootblocks (ufs-file-system, lives on blocks 1-15 of the root partition), and the experience of figuring out how to debug this was painful. We are probably going to do a bunch more debugging in this area as we make changes for extra-huge disks, EFI labels, and potentially even vxfs filesystem support. This may help the next victim down the road figure out how to debug this code - perhaps even me, given these details are all going to fall out of my brain next week.
The ufsboot code in question lives in /ws/onnv-clone/usr/src/psm/stand/bootblks/ufs/common/ufs.fth (plus various files in the immediate vicinity). Being that it's code that gets loaded from the disk, and I didn't want to try replacing it on the disk (the problem seemed to be very hard to get a reproducible case), I ended up figuring out how to debug it in situ.
First, it turns out that the bootblocks recognize "-H" as an indication that they should go back to the OK prompt after reading themselves in. So you can give the normal boot command with -H, and then fix things up after getting back to the ok prompt, and give the command "do-boot" to continue as if you hadn't said "-H".
In my case, I needed to patch a bunch of debug stuff into the ufs-block walking code, and the easiest way was to put stuff in nvramrc to patch it in. So we see in my nvramrc:
{0} ok printenv nvramrc nvramrc = devalias bnet /pci@500/pci@0/pci@8/network@0:iscsi-target-ip=129.148.67.188,host-ip=129.148.67.111,iscsi-target-name=iqn.1986-03.com.sun:02:43644186-e5bb-41f2-8b8e-f34be1afaebc : .xxx0 ." idir0 " ; : .xxx1 ." idir1 " ; : .inod ." inode " ; : fixup " /ufs-file-system" find-device " : idir0 dup . .xxx0 get-indir0 indir0-adr 40 dump cr ; : idir1 dup . .xxx1 get-indir1 indir1-adr 40 dump cr ; : dir-dump inode /inode dump cr dup ; : .itod dup . itod .inod dup . cr ; " eval " patch idir0 get-indir0 (bmap) patch idir0 get-indir0 (bmap) patch idir1 get-indir1 (bmap) patch dir-dump dup (bmap) patch .itod itod iget " eval " /chosen" find-device " bootargs" delete-property 0 0 " bootargs" property " false to halt?" eval ;
I define a method "fixup" which I will invoke after boot -H exits to the ok prompt. This will add all my patches and clean up. The last four lines of fixup are to clear up the -H - I delete the existing bootargs property (which contains -H), create an empty property, and clear the ufsboot internal flag "halt?".
{0} ok boot bnet -H Boot device: /pci@500/pci@0/pci@8/network@0:iscsi-target-ip=129.148.67.188,host-ip=129.148.67.111,iscsi-target-name=iqn.1986-03.com.sun:02:43644186-e5bb-41f2-8b8e-f34be1afaebc File and args: -H /pci@500/pci@0/pci@8/network@0: 1000 Mbps full duplex link up ufs-file-system Halted with -H flag. The file just loaded does not appear to be executable. {0} ok fixup {0} ok do-boot /pci@500/pci@0/pci@8/network@0: 1000 Mbps full duplex link up [...]
I've highlighted in green the commands involved in each debug session. Since I ended up going through this sequence around 100 times (with different debug code patched into nvramrc as understanding progressed), having it be relatively easy to type was crucial. I ended up not putting the call to "do-boot" inside fixup because I often tended to set breakpoints after invoking fixup.
What I ended up finding meant walking through UFS data structures, so I might as well walk through these as well. I ended up dumping inodes (file system nodes describing individual directory entries) and indirect blocks (which tell us where the disk blocks with content are). The first several of these are uninteresting, I'll start commenting where the interesting stuff occurs.
{0} ok do-boot /pci@500/pci@0/pci@8/network@0: 1000 Mbps full duplex link up 2 inode 20 / 1 2 3 4 5 6 7 8 9 a b c d e f v123456789abcdef fd87e200 41 ed 00 15 00 00 00 00 00 00 00 00 00 00 02 00 Am.............. fd87e210 4a 8c 7a 6b 00 02 13 c4 4a 8c 85 6b 00 09 fd 89 J.zk...DJ..k..}. fd87e220 4a 8c 85 6b 00 09 fd 89 00 00 03 08 00 00 00 00 J..k..}......... fd87e230 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e250 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e260 00 00 00 00 00 00 00 00 00 00 00 02 2c d1 76 e8 ............,Qvh fd87e270 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ b49 inode 188 / 1 2 3 4 5 6 7 8 9 a b c d e f v123456789abcdef fd87e200 41 ed 00 06 00 00 00 03 00 00 00 00 00 00 04 00 Am.............. fd87e210 4a 8c 85 6d 00 0e 97 38 4a 8c 80 f3 00 0c 98 26 J..m...8J..s...& fd87e220 4a 8c 80 f3 00 0c 98 26 00 00 7f ee 00 00 00 00 J..s...&...n.... fd87e230 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e250 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e260 00 00 00 00 00 00 00 00 00 00 00 02 52 1c 08 5d ............R..] fd87e270 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00 00 ................ bfd inode 198 / 1 2 3 4 5 6 7 8 9 a b c d e f v123456789abcdef fd87e200 a1 ff 00 01 00 00 00 00 00 00 00 00 00 00 00 05 !............... fd87e210 4a 8c 85 8b 00 08 c8 e3 4a 8c 7a 39 00 09 4f df J.....HcJ.z9..O_ fd87e220 4a 8c 7a 39 00 09 52 cd 00 00 82 0a 00 00 00 00 J.z9..RM........ fd87e230 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e250 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e260 00 00 00 00 00 00 00 00 00 00 00 02 3f 4a 14 08 ............?J.. fd87e270 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ b49 inode 188 / 1 2 3 4 5 6 7 8 9 a b c d e f v123456789abcdef fd87e200 41 ed 00 06 00 00 00 03 00 00 00 00 00 00 04 00 Am.............. fd87e210 4a 8c 85 6d 00 0e 97 38 4a 8c 80 f3 00 0c 98 26 J..m...8J..s...& fd87e220 4a 8c 80 f3 00 0c 98 26 00 00 7f ee 00 00 00 00 J..s...&...n.... fd87e230 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e250 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e260 00 00 00 00 00 00 00 00 00 00 00 02 52 1c 08 5d ............R..] fd87e270 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00 00 ................ b4a inode 188 Loading: /platform/SUNW,T5240/boot_archive
This was the file, boot_archive, that was getting clobbered - rather, read incorrectly. It turns out the inode for it is "c", down below.
2 inode 20 / 1 2 3 4 5 6 7 8 9 a b c d e f v123456789abcdef fd87e200 41 ed 00 15 00 00 00 00 00 00 00 00 00 00 02 00 Am.............. fd87e210 4a 8c 7a 6b 00 02 13 c4 4a 8c 85 6b 00 09 fd 89 J.zk...DJ..k..}. fd87e220 4a 8c 85 6b 00 09 fd 89 00 00 03 08 00 00 00 00 J..k..}......... fd87e230 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e250 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e260 00 00 00 00 00 00 00 00 00 00 00 02 2c d1 76 e8 ............,Qvh fd87e270 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ b49 inode 188 / 1 2 3 4 5 6 7 8 9 a b c d e f v123456789abcdef fd87e200 41 ed 00 06 00 00 00 03 00 00 00 00 00 00 04 00 Am.............. fd87e210 4a 8c 85 6d 00 0e 97 38 4a 8c 80 f3 00 0c 98 26 J..m...8J..s...& fd87e220 4a 8c 80 f3 00 0c 98 26 00 00 7f ee 00 00 00 00 J..s...&...n.... fd87e230 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e250 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e260 00 00 00 00 00 00 00 00 00 00 00 02 52 1c 08 5d ............R..] fd87e270 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00 00 ................ bfd inode 198 / 1 2 3 4 5 6 7 8 9 a b c d e f v123456789abcdef fd87e200 a1 ff 00 01 00 00 00 00 00 00 00 00 00 00 00 05 !............... fd87e210 4a 8c 85 8b 00 08 c8 e3 4a 8c 7a 39 00 09 4f df J.....HcJ.z9..O_ fd87e220 4a 8c 7a 39 00 09 52 cd 00 00 82 0a 00 00 00 00 J.z9..RM........ fd87e230 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e250 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e260 00 00 00 00 00 00 00 00 00 00 00 02 3f 4a 14 08 ............?J.. fd87e270 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ b49 inode 188 / 1 2 3 4 5 6 7 8 9 a b c d e f v123456789abcdef fd87e200 41 ed 00 06 00 00 00 03 00 00 00 00 00 00 04 00 Am.............. fd87e210 4a 8c 85 6d 00 0e 97 38 4a 8c 80 f3 00 0c 98 26 J..m...8J..s...& fd87e220 4a 8c 80 f3 00 0c 98 26 00 00 7f ee 00 00 00 00 J..s...&...n.... fd87e230 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e250 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e260 00 00 00 00 00 00 00 00 00 00 00 02 52 1c 08 5d ............R..] fd87e270 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00 00 ................ b4a inode 188 / 1 2 3 4 5 6 7 8 9 a b c d e f v123456789abcdef fd87e200 41 ed 00 04 00 00 00 03 00 00 00 00 00 00 02 00 Am.............. fd87e210 4a 8c 7a 38 00 0d a6 cf 4a 8c 85 a1 00 00 53 48 J.z8..&OJ..!..SH fd87e220 4a 8c 85 a1 00 00 53 48 00 00 7f ef 00 00 00 00 J..!..SH...o.... fd87e230 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e250 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fd87e260 00 00 00 00 00 00 00 00 00 00 00 02 42 eb a7 4d ............Bk'M fd87e270 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00 00 ................
c inode 20 | / 1 2 3 4 5 6 7 8 9 a b c d e f v123456789abcdef fd87e200 81 a4 00 01 00 00 00 00 00 00 00 00 11 b3 e8 00 .$...........3h. fd87e210 4a 8c 85 8c 00 03 24 91 4a 8c 85 a0 00 06 11 f6 J.....$.J.. ...v fd87e220 4a 8c 85 a1 00 00 53 49 00 00 03 88 00 00 03 90 J..!..SI........ fd87e230 00 00 03 98 00 00 03 a0 00 00 03 a8 00 00 03 b0 ....... ...(...0 fd87e240 00 00 03 b8 00 00 03 c0 00 00 03 c8 00 00 03 d0 ...8...@...H...P fd87e250 00 00 03 d8 00 00 03 e0 00 52 ee 00 00 6a 2d 98 ...X...`.Rn..j-. fd87e260 00 00 00 00 00 00 00 00 00 08 db 30 38 24 ea 64 ..........[08$jd fd87e270 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
This was the inode describing the file that had a problem. One of the things I learned in this process is that the inode contains 12 direct data block pointers (colored purple above), which are actually cluster-block numbers. You take the number presented, shift by the contents of superblock offset 0x64, and that gives you the relative block number in the partition. On the above disk, the partition was set up for two-block clusters, giving us 0x400 bytes per pointer, or 12kiB of data before we had to go to the next larger chunk of data.
/ 1 2 3 4 5 6 7 8 9 a b c d e f v123456789abcdef fd87e200 81 a4 00 01 00 00 00 00 00 00 00 00 11 b3 e8 00 .$...........3h. fd87e210 4a 8c 85 8c 00 03 24 91 4a 8c 85 a0 00 06 11 f6 J.....$.J.. ...v fd87e220 4a 8c 85 a1 00 00 53 49 00 00 03 88 00 00 03 90 J..!..SI........ fd87e230 00 00 03 98 00 00 03 a0 00 00 03 a8 00 00 03 b0 ....... ...(...0 fd87e240 00 00 03 b8 00 00 03 c0 00 00 03 c8 00 00 03 d0 ...8...@...H...P fd87e250 00 00 03 d8 00 00 03 e0 00 52 ee 00 00 6a 2d 98 ...X...`.Rn..j-. fd87e260 00 00 00 00 00 00 00 00 00 08 db 30 38 24 ea 64 ..........[08$jd fd87e270 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 52ee00 idir0 / 1 2 3 4 5 6 7 8 9 a b c d e f v123456789abcdef fd87a000 00 52 ee 08 00 52 ee 10 00 52 ee 18 00 52 ee 20 .Rn..Rn..Rn..Rn fd87a010 00 52 ee 28 00 52 ee 30 00 52 ee 38 00 52 ee 40 .Rn(.Rn0.Rn8.Rn@ fd87a020 00 52 ee 48 00 52 ee 50 00 52 ee 58 00 52 ee 60 .RnH.RnP.RnX.Rn` fd87a030 00 52 ee 68 00 52 ee 70 00 52 ee 78 00 52 ee 80 .Rnh.Rnp.Rnx.Rn.
After the twelve direct pointers, there is an indirect pointer (colored brown above), which points to an entire block of data pointers. That gives us several hundred KiB before we have to go to the next level of indirection. The start of the block it points to (number 52ee00) is dumped out immediately the inode, and you see the start of an array of pointers. Reading that takes a while.
/ 1 2 3 4 5 6 7 8 9 a b c d e f v123456789abcdef fd87e200 81 a4 00 01 00 00 00 00 00 00 00 00 11 b3 e8 00 .$...........3h. fd87e210 4a 8c 85 8c 00 03 24 91 4a 8c 85 a0 00 06 11 f6 J.....$.J.. ...v fd87e220 4a 8c 85 a1 00 00 53 49 00 00 03 88 00 00 03 90 J..!..SI........ fd87e230 00 00 03 98 00 00 03 a0 00 00 03 a8 00 00 03 b0 ....... ...(...0 fd87e240 00 00 03 b8 00 00 03 c0 00 00 03 c8 00 00 03 d0 ...8...@...H...P fd87e250 00 00 03 d8 00 00 03 e0 00 52 ee 00 00 6a 2d 98 ...X...`.Rn..j-. fd87e260 00 00 00 00 00 00 00 00 00 08 db 30 38 24 ea 64 ..........[08$jd fd87e270 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 6a2d98 idir0 / 1 2 3 4 5 6 7 8 9 a b c d e f v123456789abcdef fd87a000 00 6a 2d a0 00 7f c4 e8 00 7f c5 d8 00 80 46 08 .j- ..Dh..EX..F. fd87a010 00 80 88 c8 00 81 00 08 00 81 42 f0 00 81 82 f8 ...H......Bp...x fd87a020 00 80 c8 d0 00 63 89 00 00 81 ce c0 00 82 0e c8 ..HP.c....N@...H fd87a030 00 82 80 08 00 82 c2 f0 00 83 02 f8 00 83 43 08 ......Bp...x..C. 6a2da0 idir1 / 1 2 3 4 5 6 7 8 9 a b c d e f v123456789abcdef fd87c000 00 6a 2d a8 00 6a 2d b0 00 6a 2d b8 00 6a 2d c0 .j-(.j-0.j-8.j-@ fd87c010 00 6a 2d c8 00 6a 2d d0 00 6a 2d d8 00 6a 2d e0 .j-H.j-P.j-X.j-` fd87c020 00 6a 2d e8 00 6a 2d f0 00 6a 2d f8 00 6a 2e 00 .j-h.j-p.j-x.j.. fd87c030 00 6a 2e 08 00 6a 2e 10 00 6a 2e 18 00 6a 2e 20 .j...j...j...j.
The next pointer (colored green above), is a pointer to an entire block of pointers to indirect blocks. That gives us several hundred megabytes before we need to go to the next level. In this case, we follow the first indirect pointer (colored blue here) to the second-level list of blocks. This one worked without any trouble and spent some time reading, so we went to the next indirect pointer: | / 1 2 3 4 5 6 7 8 9 a b c d e f v123456789abcdef fd87e200 81 a4 00 01 00 00 00 00 00 00 00 00 11 b3 e8 00 .$...........3h. fd87e210 4a 8c 85 8c 00 03 24 91 4a 8c 85 a0 00 06 11 f6 J.....$.J.. ...v fd87e220 4a 8c 85 a1 00 00 53 49 00 00 03 88 00 00 03 90 J..!..SI........ fd87e230 00 00 03 98 00 00 03 a0 00 00 03 a8 00 00 03 b0 ....... ...(...0 fd87e240 00 00 03 b8 00 00 03 c0 00 00 03 c8 00 00 03 d0 ...8...@...H...P fd87e250 00 00 03 d8 00 00 03 e0 00 52 ee 00 00 6a 2d 98 ...X...`.Rn..j-. fd87e260 00 00 00 00 00 00 00 00 00 08 db 30 38 24 ea 64 ..........[08$jd fd87e270 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 6a2d98 idir0 / 1 2 3 4 5 6 7 8 9 a b c d e f v123456789abcdef fd87a000 00 6a 2d a0 00 7f c4 e8 00 7f c5 d8 00 80 46 08 .j- ..Dh..EX..F. fd87a010 00 80 88 c8 00 81 00 08 00 81 42 f0 00 81 82 f8 ...H......Bp...x fd87a020 00 80 c8 d0 00 63 89 00 00 81 ce c0 00 82 0e c8 ..HP.c....N@...H fd87a030 00 82 80 08 00 82 c2 f0 00 83 02 f8 00 83 43 08 ......Bp...x..C. 7fc4e8 idir1 / 1 2 3 4 5 6 7 8 9 a b c d e f v123456789abcdef fd87c000 28 6e 6f 75 6e 29 7c 63 6f 6e 67 6f 20 73 6e 61 (noun)|congo sna fd87c010 6b 65 7c 63 6f 6e 67 6f 20 65 65 6c 7c 62 6c 69 ke|congo eel|bli fd87c020 6e 64 20 65 65 6c 7c 73 61 6c 61 6d 61 6e 64 65 nd eel|salamande fd87c030 72 20 28 67 65 6e 65 72 69 63 20 74 65 72 6d 29 r (generic term)
seek failed ok
The above was the failure scenario. We followed the same indirect pointer (green) to the second indirect pointer (listed in red here), and that got us garbage. Running fsck didn't seem to find any errors, so I replicated the above process under Solaris to see where I'm getting different data than it is. It turned out to be a truncation error due to 32/64 issues:
Pointer 7fc4e8 is in 0x400-byte blocks (due to fsbtodbc containing 1), so this translates to byte address 1.ff13a.000 on partition 0.
# /home/tarl/tools/bin.sparc/dump /dev/rdsk/c2t600144F0800A890000004A7378820001d0s0 1ff13a000 10
1FF13A000 007f c4f0 007f c5f8 007f c600 007f c608 ................ 1FF13A010 007f c610 007f c618 007f c620 007f c628 ........... ...( 1FF13A020 007f c630 007f c638 007f c640 007f c648 ...0...8...@...H 1FF13A030 007f c650 007f c658 007f c660 007f c668 ...P...X...`...h 1FF13A040 007f c670 007f c678 007f c680 007f c688 ...p...x........ 1FF13A050 007f c690 007f c698 007f c6a0 007f c6a8 ................ 1FF13A060 007f c6b0 007f c6b8 007f c6c0 007f c6c8 ................ 1FF13A070 007f c6d0 007f c6d8 007f c6e0 007f c6e8 ................ 1FF13A080 007f c6f0 007f c6f8 007f c700 007f c708 ................ 1FF13A090 007f c710 007f c718 007f c720 007f c728 ........... ...( 1FF13A0A0 007f c730 007f c738 007f c740 007f c748 ...0...8...@...H 1FF13A0B0 007f c750 007f c758 007f c760 007f c768 ...P...X...`...h 1FF13A0C0 007f c770 007f c778 007f c780 007f c788 ...p...x........ 1FF13A0D0 007f c790 007f c798 007f c7a0 007f c7a8 ................ 1FF13A0E0 007f c7b0 007f c7b8 007f c7c0 007f c7c8 ................ 1FF13A0F0 007f c7d0 007f c7d8 007f c7e0 007f c7e8 ................
Hmm. That's not the data I got above. After some serendipity (I was using a firmworks forth simulator for arithmetic and it accidentally truncated a value on me), I found:
# /home/tarl/tools/bin.sparc/dump /dev/rdsk/c2t600144F0800A890000004A7378820001d0s0 ff13a000 10
FF13A000 286e 6f75 6e29 7c63 6f6e 676f 2073 6e61 (noun)|congo sna FF13A010 6b65 7c63 6f6e 676f 2065 656c 7c62 6c69 ke|congo eel|bli FF13A020 6e64 2065 656c 7c73 616c 616d 616e 6465 nd eel|salamande FF13A030 7220 2867 656e 6572 6963 2074 6572 6d29 r (generic term) FF13A040 0a61 6d70 6869 756d 6964 6165 7c31 0a28 .amphiumidae|1.( FF13A050 6e6f 756e 297c 416d 7068 6975 6d69 6461 noun)|Amphiumida FF13A060 657c 6661 6d69 6c79 2041 6d70 6869 756d e|family Amphium FF13A070 6964 6165 7c61 6d70 6869 6269 616e 2066 idae|amphibian f FF13A080 616d 696c 7920 2867 656e 6572 6963 2074 amily (generic t FF13A090 6572 6d29 0a61 6d70 686f 7261 7c31 0a28 erm).amphora|1.( FF13A0A0 6e6f 756e 297c 6a61 7220 2867 656e 6572 noun)|jar (gener FF13A0B0 6963 2074 6572 6d29 0a61 6d70 686f 7269 ic term).amphori FF13A0C0 637c 310a 2861 646a 297c 6c69 7374 656e c|1.(adj)|listen FF13A0D0 696e 677c 6865 6172 696e 677c 6469 6167 ing|hearing|diag FF13A0E0 6e6f 7374 6963 2070 726f 6365 6475 7265 nostic procedure FF13A0F0 7c64 6961 676e 6f73 7469 6320 7465 6368 |diagnostic tech
Yup. That's the garbage I was getting. So we somehow lost a high-order bit between UFS and iSCSI CDB generation. It turns out to have been a "d+" in 64-bit mode rather than 32-bit mode, so carries weren't being moved along. But I rather suspect this won't be the last problem we trip over in the boot blocks.
On 2009/12/08 at 21:22, Tarl Neustaedter Tarl.Neustaedter@Sun.COM wrote:
Nick Couchman wrote:
Incidentally I notice that the latest Solaris code doesn't seem to use alarm anymore; Nick - does your Nevada boot get any further with current OpenBIOS SVN?
Nevada doesn't get any further, and I get the same error trying to boot the
Milax 0.3.2 SPARC ISO. Trying to dig up some debugging information right now, but I'm running into any issue with none of the words being available for debugging until after the boot command has been issues, which of course then causes the unhandled exception...
Right. If you're doing Nevada, the booter should respond to a "-H" (capital H) switch to abort the boot after defining the words. The boot command does essentially:
I don't know whether it's OpenBIOS or Qemu, or the exception happens before the -H argument can take place, but "boot cdrom -H" doesn't seem to have any effect - I get exactly the same behavior whether I use the -H flag or not. I'll take a look at the rest of your debugging tips and try to put them to use - thanks!!
-Nick
-------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
Nick Couchman wrote:
[...] I don't know whether it's OpenBIOS or Qemu, or the exception happens before the -H argument can take place, but "boot cdrom -H" doesn't seem to have any effect - I get exactly the same behavior whether I use the -H flag or not. I'll take a look at the rest of your debugging tips and try to put them to use - thanks!!
I suspect it's not failing *before* checking for -H, just that the check isn't working.
Try creating a property in chosen called "bootargs" containing "-H" before giving the boot command. Maybe OpenBios is doing something different with boot arguments. E.g.:
ok dev /chosen ok " -H" encode-string " bootargs" property ok boot <stuff>
On 2009/12/08 at 21:39, Tarl Neustaedter Tarl.Neustaedter@Sun.COM wrote:
Nick Couchman wrote:
[...] I don't know whether it's OpenBIOS or Qemu, or the exception happens before
the -H argument can take place, but "boot cdrom -H" doesn't seem to have any effect - I get exactly the same behavior whether I use the -H flag or not. I'll take a look at the rest of your debugging tips and try to put them to use - thanks!!
I suspect it's not failing *before* checking for -H, just that the check isn't working.
Try creating a property in chosen called "bootargs" containing "-H" before giving the boot command. Maybe OpenBios is doing something different with boot arguments. E.g.:
ok dev /chosen ok " -H" encode-string " bootargs" property ok boot <stuff>
Aha! Worked great! Now I can do debug do-boot. However, once I've created this and set up the debugging, how do I clear it out so that it doesn't halt the do-boot process?
Thanks! -Nick
-------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
Nick Couchman wrote:
I suspect it's not failing *before* checking for -H, just that the check isn't working.
Ah okay - is this a bug in the OpenBIOS implementation? My process for debugging so far has been to turn off autoboot in OpenBIOS and then stick a breakpoint on one of the CIF methods and go from there; however this method seems much better!
Try creating a property in chosen called "bootargs" containing "-H" before giving the boot command. Maybe OpenBios is doing something different with boot arguments. E.g.:
ok dev /chosen ok " -H" encode-string " bootargs" property ok boot <stuff>
Aha! Worked great! Now I can do debug do-boot. However, once I've created this and set up the debugging, how do I clear it out so that it doesn't halt the do-boot process?
I think you should be able to clear it (set it to an empty string) with:
dev /chosen " " encode-string " bootargs" property
followed by:
boot cdrom
HTH,
Mark.
Nick Couchman wrote:
[...] ok dev /chosen ok " -H" encode-string " bootargs" property ok boot <stuff>
Aha! Worked great! Now I can do debug do-boot. However, once I've created this and set up the debugging, how do I clear it out so that it doesn't halt the do-boot process?
Just clobber the bootargs property and the halt? flag:
ok dev /chosen ok " bootargs" delete-property ok 0 0 " bootargs" property ok false to halt? ok do-boot
You might want to define the above in nvramrc inside a method, because you'll be doing it a lot :-) Something like having nvramrc contain:
: fixup " /chosen" find-device " bootargs" delete-property 0 0 " bootargs" property " false to halt?" eval ;
That last is inside an "eval" string because halt? won't be defined at nvramrc time.
On 2009/12/09 at 06:53, Tarl Neustaedter Tarl.Neustaedter@Sun.COM wrote:
Nick Couchman wrote:
[...] ok dev /chosen ok " -H" encode-string " bootargs" property ok boot <stuff>
Aha! Worked great! Now I can do debug do-boot. However, once I've created
this and set up the debugging, how do I clear it out so that it doesn't halt the do-boot process?
Just clobber the bootargs property and the halt? flag:
ok dev /chosen ok " bootargs" delete-property ok 0 0 " bootargs" property ok false to halt? ok do-boot
Great - thanks!
-Nick
-------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.