On Fri, 2011-04-22 at 08:08 +0100, Mark Cave-Ayland wrote:
With my previously posted patched applied:
Wow. All I can say is thank you so much to everyone who helped me get this far - in particular Blue, Artyom and Tarl. Does anyone know if we are the first non-Sun firmware to be able to boot a Solaris kernel?
Blue - I think you owe me that 1.1 release soon!
Happy Easter everyone!
Mark.
Thanks, again, Mark - this is fantastic!
After booting the install successfully from the Solaris ISO/CDROM and getting through the initial configuration (on both Solaris 8 and 9), the install fails trying to label the hard drive. First, I see the following during the installer boot:
WARNING: /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000/sd@0,0 (sd0): corrupt label - wrong magic number
Then, after doing the system identification, I get the following:
One or more disks are found, but one of the following problems exists:
> Hardware failure
> Unformatted disk.
At which point I get dropped to a shell. If I try to use the format command from the shell, format cannot auto-configure the disk. Even if I try manually setting C/H/S parameters (either at qemu run time or in the format command), it does not seem to want to use the disk. This seems to happen whether I'm using an LVM volume or a file-based disk. Also, seems as though all changes are lost on the disk when I exit format and then immediately go back in.
Ideas? Any further help/input/debugging I could provide?
-Nick
-------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
On 22/04/11 20:58, Nick Couchman wrote:
Thanks, again, Mark - this is fantastic!
After booting the install successfully from the Solaris ISO/CDROM and getting through the initial configuration (on both Solaris 8 and 9), the install fails trying to label the hard drive. First, I see the following during the installer boot:
WARNING: /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000/sd@0,0 (sd0): corrupt label - wrong magic number
Then, after doing the system identification, I get the following:
One or more disks are found, but one of the following problems exists: > Hardware failure > Unformatted disk.
At which point I get dropped to a shell. If I try to use the format command from the shell, format cannot auto-configure the disk. Even if I try manually setting C/H/S parameters (either at qemu run time or in the format command), it does not seem to want to use the disk. This seems to happen whether I'm using an LVM volume or a file-based disk. Also, seems as though all changes are lost on the disk when I exit format and then immediately go back in.
Ideas? Any further help/input/debugging I could provide?
Hi Nick,
Thanks again for testing this. The issue with Solaris not detecting the disks is a known one - the following guide should point you in the right direction:
http://virtuallyfun.blogspot.com/2010/10/formatting-disks-for-solaris.html
Also don't forget the "set scsi_options" part from Artyom's SPARC/QEMU howto here: http://tyom.blogspot.com/2009/12/solaris-under-qemu-how-to.html
I think it may actually be possible to come up with a patch for OpenBIOS so that the scsi_options change isn't required - let me know how you get on with the above links, and if it all seems to work I'll look at creating the patch for you.
ATB,
Mark.
On Sat, Apr 23, 2011 at 3:55 PM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Also don't forget the "set scsi_options" part from Artyom's SPARC/QEMU howto here: http://tyom.blogspot.com/2009/12/solaris-under-qemu-how-to.html
I think it may actually be possible to come up with a patch for OpenBIOS so that the scsi_options change isn't required - let me know how you get on with the above links, and if it all seems to work I'll look at creating the patch for you.
I suspect that it's a qemu esp bug. Either Solaris recognizes a wrong esp chipset version or qemu esp doesn't implement some necessary features the of the esp version it claims to be. In the later case we should fix qemu, not OpenBIOS. If you can find out what Solaris is expecting I would try to implement it in esp.
On 26/04/11 10:30, Artyom Tarasenko wrote:
I suspect that it's a qemu esp bug. Either Solaris recognizes a wrong esp chipset version or qemu esp doesn't implement some necessary features the of the esp version it claims to be. In the later case we should fix qemu, not OpenBIOS. If you can find out what Solaris is expecting I would try to implement it in esp.
Ah okay. What does that change to scsi_options actually *do*? I've seen reference to it in your HOWTO, but I don't see any explanation as to why it is needed and/or what symptoms you were seeing and how you determined this was the correct workaround?
ATB,
Mark.
Ah okay. What does that change to scsi_options actually *do*?
From glancing at the include file (usr/src/uts/sys/scsi/conf/autoconf.h, should be in the exported opensolaris), the bits 0x58 enable:
- Global disconnect/reconnect - Global Linked Commands - Global Parity Support
Notably it does *not* include global synchronous transfer capability, tagged command support or any of the fast/wide possibilities. I can't find where the default scsi_options are for the esp driver, but presumably it had some of those capabilities enabled, which aren't supported by the qemu scsi driver.
On 26/04/11 11:08, Tarl Neustaedter wrote:
From glancing at the include file (usr/src/uts/sys/scsi/conf/autoconf.h, should be in the exported opensolaris), the bits 0x58 enable:
- Global disconnect/reconnect
- Global Linked Commands
- Global Parity Support
Notably it does *not* include global synchronous transfer capability, tagged command support or any of the fast/wide possibilities. I can't find where the default scsi_options are for the esp driver, but presumably it had some of those capabilities enabled, which aren't supported by the qemu scsi driver.
Oh I see. By enabling romvec debugging in OpenBIOS, I can see that it looks for a property named "scsi-options" in the ESP node of the device tree. Therefore the following patch may persuade Solaris to set this option in the ESP kernel module by default:
diff --git a/openbios-devel/drivers/esp.c b/openbios-devel/drivers/esp.c index 2dfc2bd..d6fa9bc 100644 --- a/openbios-devel/drivers/esp.c +++ b/openbios-devel/drivers/esp.c @@ -383,6 +383,12 @@ ob_esp_initialize(__attribute__((unused)) esp_private_t **esp) push_str("scsi"); fword("device-type");
+ /* set scsi-options to help Solaris boot */ + PUSH(0x58); + fword("encode-int"); + push_str("scsi-options"); + fword("property"); + PUSH(0x24); fword("encode-int"); PUSH(0);
Nick - do you think you could do a quick test on this one?
ATB,
Mark.
On Tue, Apr 26, 2011 at 12:56 PM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
On 26/04/11 11:08, Tarl Neustaedter wrote:
From glancing at the include file (usr/src/uts/sys/scsi/conf/autoconf.h, should be in the exported opensolaris), the bits 0x58 enable:
- Global disconnect/reconnect
- Global Linked Commands
- Global Parity Support
Notably it does *not* include global synchronous transfer capability, tagged command support or any of the fast/wide possibilities. I can't find where the default scsi_options are for the esp driver, but presumably it had some of those capabilities enabled, which aren't supported by the qemu scsi driver.
Oh I see. By enabling romvec debugging in OpenBIOS, I can see that it looks for a property named "scsi-options" in the ESP node of the device tree. Therefore the following patch may persuade Solaris to set this option in the ESP kernel module by default:
diff --git a/openbios-devel/drivers/esp.c b/openbios-devel/drivers/esp.c index 2dfc2bd..d6fa9bc 100644 --- a/openbios-devel/drivers/esp.c +++ b/openbios-devel/drivers/esp.c @@ -383,6 +383,12 @@ ob_esp_initialize(__attribute__((unused)) esp_private_t **esp) push_str("scsi"); fword("device-type");
- /* set scsi-options to help Solaris boot */
- PUSH(0x58);
- fword("encode-int");
- push_str("scsi-options");
- fword("property");
PUSH(0x24); fword("encode-int"); PUSH(0);
This would actually be actually fixing the symptom instead of the cause. Is ok if the cause is tagged queue support though.
Nick - do you think you could do a quick test on this one?
ATB,
Mark.
-- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063
Sirius Labs: http://www.siriusit.co.uk/labs
-- OpenBIOS http://openbios.org/ Mailinglist: http://lists.openbios.org/mailman/listinfo Free your System - May the Forth be with you
On 26/04/11 12:37, Artyom Tarasenko wrote:
This would actually be actually fixing the symptom instead of the cause. Is ok if the cause is tagged queue support though.
I agree. If it's a missing feature from the QEMU emulation then I have no issue with working around it - otherwise if it's an emulation bug, then we should probably fix that.
ATB,
Mark.
On Tue, Apr 26, 2011 at 3:09 PM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
On 26/04/11 12:37, Artyom Tarasenko wrote:
This would actually be actually fixing the symptom instead of the cause. Is ok if the cause is tagged queue support though.
I agree. If it's a missing feature from the QEMU emulation then I have no issue with working around it - otherwise if it's an emulation bug, then we should probably fix that.
Was tagged queuing already used during Sparc32 times?
I think QEMU's disks advertise supporting SCSI-3, maybe it should be possible to downgrade this for old systems.
On Tue, Apr 26, 2011 at 12:08 PM, Tarl Neustaedter tarl-b2@tarl.net wrote:
Ah okay. What does that change to scsi_options actually *do*?
From glancing at the include file (usr/src/uts/sys/scsi/conf/autoconf.h, should be in the exported opensolaris), the bits 0x58 enable:
- Global disconnect/reconnect
- Global Linked Commands
- Global Parity Support
Notably it does *not* include global synchronous transfer capability, tagged command support or any of the fast/wide possibilities. I can't find where the default scsi_options are for the esp driver, but presumably it had some of those capabilities enabled, which aren't supported by the qemu scsi driver.
Hm. Fast/wide and synchronous should be no problem (unless there are certain timings expected). Tagged command is a bigger issue, since afaik none of QEMU's HBAs support it.
Hm. Fast/wide and synchronous should be no problem (unless there are certain timings expected).
Fast should be a non-issue. Wide may be a problem if we get odd sized transfers wrong. IIUC wide transfers are initiated by the target (disk) and qemu never does this, so should not be a problem either.
Tagged command is a bigger issue, since afaik none of QEMU's HBAs support it.
Other QEMU HBAs (LSI53C895A) do support tagged command queueing, however the ESP does not.
By my reading qemu esp emulation is missing proper handling of MESSAGE phases. It assumes only a single IDENTIFY byte is sent, probably interpreting subsequent SIMPLE queue tags as the start of the command block. Implementing full command queueing is fairly tricky. However a stub implementation that accepts the tag but avoids disconnecting (force sequential execution of commands) should be somewhat simpler.
Paul
On Tue, Apr 26, 2011 at 9:53 PM, Paul Brook paul@codesourcery.com wrote:
Hm. Fast/wide and synchronous should be no problem (unless there are certain timings expected).
Fast should be a non-issue. Wide may be a problem if we get odd sized transfers wrong. IIUC wide transfers are initiated by the target (disk) and qemu never does this, so should not be a problem either.
After a bit experimenting it turend out that I was wrong about synchronous. This seems to be the problem. The comment in the OpenSolaris esp driver http://src.opensolaris.org/source/xref/xen-gate/onnv-3.4/usr/src/uts/sun/io/... also explains one reason why:
/* * Set up to send synch. negotiating message. This is getting * a bit tricky as we dma out the identify message and * send the other messages via the fifo buffer. */
Looks like the qemu esp doesn't support mixing dma/fifo messages.
Tagged command is a bigger issue, since afaik none of QEMU's HBAs support it.
Other QEMU HBAs (LSI53C895A) do support tagged command queueing, however the ESP does not.
By my reading qemu esp emulation is missing proper handling of MESSAGE phases. It assumes only a single IDENTIFY byte is sent, probably interpreting subsequent SIMPLE queue tags as the start of the command block. Implementing full command queueing is fairly tricky. However a stub implementation that accepts the tag but avoids disconnecting (force sequential execution of commands) should be somewhat simpler.
This seems to be necessary too because the OpenSolaris driver seems to use at least two tags:
esp->e_cur_msgout[i++] = sp->cmd_tag[0]; esp->e_cur_msgout[i++] = sp->cmd_tag[1];
Artyom
On 27/04/11 17:31, Artyom Tarasenko wrote:
After a bit experimenting it turend out that I was wrong about synchronous. This seems to be the problem. The comment in the OpenSolaris esp driver http://src.opensolaris.org/source/xref/xen-gate/onnv-3.4/usr/src/uts/sun/io/... also explains one reason why:
/*
- Set up to send synch. negotiating message. This is getting
- a bit tricky as we dma out the identify message and
- send the other messages via the fifo buffer.
*/
Looks like the qemu esp doesn't support mixing dma/fifo messages.
(cut)
This seems to be necessary too because the OpenSolaris driver seems to use at least two tags:
esp->e_cur_msgout[i++] = sp->cmd_tag[0]; esp->e_cur_msgout[i++] = sp->cmd_tag[1];
Nice detective work! Is a fix for this particularly easy to create, or would my patch be the best short term fix?
ATB,
Mark.
On Wed, Apr 27, 2011 at 10:29 PM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
On 27/04/11 17:31, Artyom Tarasenko wrote:
After a bit experimenting it turend out that I was wrong about synchronous. This seems to be the problem. The comment in the OpenSolaris esp driver
http://src.opensolaris.org/source/xref/xen-gate/onnv-3.4/usr/src/uts/sun/io/... also explains one reason why:
/*
- Set up to send synch. negotiating message. This is getting
- a bit tricky as we dma out the identify message and
* send the other messages via the fifo buffer. */
Looks like the qemu esp doesn't support mixing dma/fifo messages.
(cut)
This seems to be necessary too because the OpenSolaris driver seems to use at least two tags:
esp->e_cur_msgout[i++] = sp->cmd_tag[0]; esp->e_cur_msgout[i++] = sp->cmd_tag[1];
Nice detective work! Is a fix for this particularly easy to create, or would my patch be the best short term fix?
Maybe ESP could be modified so that a DMA operation would just push the data to FIFO, this is probably what happens in a real ESP. But I'd just use your patch for now.
On Tue, Apr 26, 2011 at 11:56 AM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
On 26/04/11 10:30, Artyom Tarasenko wrote:
I suspect that it's a qemu esp bug. Either Solaris recognizes a wrong esp chipset version or qemu esp doesn't implement some necessary features the of the esp version it claims to be. In the later case we should fix qemu, not OpenBIOS. If you can find out what Solaris is expecting I would try to implement it in esp.
Ah okay. What does that change to scsi_options actually *do*? I've seen reference to it in your HOWTO, but I don't see any explanation as to why it is needed and/or what symptoms you were seeing and how you determined this was the correct workaround?
That was easy: booting in a single user mode from an install CD worked, booting from a HDD image didn't. I looked at the installer /etc/system and took the options from there. :-)