Blue Swirl wrote:
My test are again like with r731, but Milax does not get as far as with r732.
Really? I know Milax segfaults with r733 but it's definitely getting a lot further if you enable DEBUG_CIF - it looks like it's trying to read files from the UFS filesystem before it finally dies.
The tests which had problems in r733 were Ubuntu 20080110.1, Aurora 2.0 and 2.1 install CDs.
I think we're definitely chasing an emulation/IO bug here. For example, trying to boot MarTux (Natamar_0.4__b96_sparc_cdrom.iso) with current OpenBIOS SVN gives the following:
OpenBIOS for Sparc64 Configuration device id QEMU version 1 machine id 0 kernel cmdline CPUs: 1 x SUNW,UltraSPARC-II UUID: 00000000-0000-0000-0000-000000000000 Welcome to OpenBIOS v1.0 built on Apr 5 2010 10:06 Type 'help' for detailed information
[sparc64] Booting file 'disk' with parameters '' Not a bootable ELF image Not a Linux kernel image Not a bootable a.out image Loading FCode image... Loaded 7392 bytes entry point is 0x4000
Jumping to entry point 0000000000004000 for type 0000000000000010... Evaluating FCode... Unhandled Exception 0x0000000000000032 PC = 0x00000000ffd1bf10 NPC = 0x00000000ffd1bf14 Stopping execution
Checking openbios-builtin.syms shows that the exception is happening in drivers/ide.c:ob_ide_insw. Adding a debugging printk confirms this:
(lots cut) port: 500 addr: 00000000ffec66c0 count: 100 port: 500 addr: 0000000008002000 count: 100 port: 500 addr: 0000000008002200 count: 100 port: 500 addr: 0000000008002400 count: 100 port: 500 addr: 0000000008002600 count: 100 port: 500 addr: 0000000051000000 count: 100 Unhandled Exception 0x0000000000000032 PC = 0x00000000ffd1df74 NPC = 0x00000000ffd1df78 Stopping execution
Note that 0x51000000 is the address where the Fcode normally loads the ELF image from disk/CDROM. What is more interesting is that the same call with the same parameters seems to work fine if I leave the printk in place while booting Milax:
(lots cut) port: 500 addr: 00000000ffec66c0 count: 100 port: 500 addr: 0000000008002000 count: 100 port: 500 addr: 0000000008002200 count: 100 port: 500 addr: 0000000008002400 count: 100 port: 500 addr: 0000000008002600 count: 100 port: 500 addr: 0000000051000000 count: 100 port: 500 addr: 0000000051000200 count: 100 port: 500 addr: 0000000051000400 count: 100 port: 500 addr: 0000000051000600 count: 100 port: 500 addr: 0000000051000800 count: 100 port: 500 addr: 0000000051000a00 count: 100 port: 500 addr: 0000000051000c00 count: 100 port: 500 addr: 0000000051000e00 count: 100 port: 500 addr: 0000000051001000 count: 100 (lots cut)
So I'm thinking perhaps that this is some kind of emulation rather than IO bug. The SPARC docs refers to trap 0x32 as a "data_access_error" which is being generated by qemu in response to a certain condition, but I'm not exactly sure what that is yet.
ATB,
Mark.