Re: [OpenBIOS] sparc-softmmu uninitialized memory read? - OpenBIOS

6 May 2012


      On Sun, May 6, 2012 at 2:02 PM, Andreas Färber afaerber@suse.de wrote:
...
Am 06.05.2012 13:32, schrieb Blue Swirl:
...
On Sat, May 5, 2012 at 3:37 PM, Andreas Färber afaerber@suse.de wrote:
...
Hello Blue,
Testing a potential AREG0 fix for ppc host by malc I got an error
running `./sparc-softmmu/sparc-softmmu` (same with CD/kernel):
qemu: fatal: Trap 0x07 while interrupts disabled, Error state
pc: 00005e0c  npc: 00005e10
General Registers:
%g0-7: 00000000 00000001 babababa 00000000 00000020 07ffff08 07ffe000
babababa
Current Register Window:
%o0-7: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
%l0-7: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
%i0-7: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
Floating Point Registers:
%f00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
%f08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
%f16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
%f24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
psr: 048000c0 (icc: N--- SPE: SP-) wim: 00000001
fsr: 00000000 y: 00000020
Abgebrochen
The 0xbabababa in %g2 and %g7 is a signature I've seen in uninitialized
memory on openSUSE 12.1 Betas. So I ran valgrind, and the following
caught my eye on both ppc and x86_64:
==18801== Command: ./sparc-softmmu/qemu-system-sparc
==18801==
==18801== Thread 2:
==18801== Conditional jump or move depends on uninitialised value(s)
==18801==    at 0x25C5AF: compute_all_logic (cc_helper.c:37)
==18801==    by 0x25C648: helper_compute_psr (cc_helper.c:470)
==18801==    by 0x8CD0981: ???
==18801==  Uninitialised value was created by a heap allocation
==18801==    at 0x4C27CE8: memalign (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==18801==    by 0x4C27D97: posix_memalign (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==18801==    by 0x1F2101: qemu_memalign (oslib-posix.c:93)
==18801==    by 0x1F21A9: qemu_vmalloc (oslib-posix.c:126)
==18801==    by 0x2665F6: qemu_ram_alloc_from_ptr (exec.c:2647)
==18801==    by 0x286D76: memory_region_init_ram (memory.c:954)
==18801==    by 0x297FFD: ram_init1 (sun4m.c:757)
==18801==    by 0x204DAE: qdev_init (qdev.c:151)
==18801==    by 0x204EEC: qdev_init_nofail (qdev.c:258)
==18801==    by 0x298845: ram_init.constprop.7 (sun4m.c:783)
==18801==    by 0x298980: sun4m_hw_init (sun4m.c:862)
==18801==    by 0x2994A2: ss5_init (sun4m.c:1289)
This is at 8f473dd104f0937ce98523fa6f9de0bd845aebbe, and cc_helper.c:37
is int32_t dst argument of get_NZ_icc(), which is always called with
CC_DST, i.e. env->cc_dst.
This seems to indicate that a read from uninitialized memory occurred,
from which cc_dst is being initialized?
This should happen in target-sparc/cpu.c:45
    memset(env, 0, offsetof(CPUSPARCState, breakpoints));
cc_dst is between structure start and CPU_COMMON.
89aaf60dedbe0e6415acfe816e02b538e5c54e68 fixed a bug relating to reset recently.
The still-current master commit above includes that fix though, and
that's no explanation for the uninitialized memory stemming from sun4m
RAM as opposed to QOM object_new(). Somewhere a read is happening,
possibly in OpenBIOS, from uninitialized memory that is then stored into
the CPUSPARCState after that has been zero-initialized, IIUC.
Ok, I see it now. OpenBIOS assumes that the Sparc32 SMP table is valid
when the valid field is nonzero, indicating secondary processor setup
so OpenBIOS jumps to the location indicated with the SMP table. With
0xbabababa in memory, this fragile logic fails and there is the early
crash.
https://tracker.coreboot.org/trac/openbios/browser/trunk/openbios-devel/arch...
https://tracker.coreboot.org/trac/openbios/browser/trunk/openbios-devel/arch...
I think the current logic would also not survive a reset just when a
secondary processor is brought online.
The fix is to make the SMP table logic robust, for example with a
checksum. We could also read CPU ID from MXCC and skip the check for
boot CPU, though MXCC should not exist for all models.
...
My issue here is that sparc64 boots HelenOS fine up until it's trying to
load the kernel (identical to x86_64 host) but sparc32 exits really
early on ppc. It might well be that there's a bug hidden in malc's TCG
patch that's causing the fatal error state, but the uninitialized memory
report is on both TCG hosts, so unlikely TCG-related.
/-F
...
...
Any idea where that could originate from or how to further debug?
It doesn't seem to be caused by the
7d21dcc84b8c07918124a9c0708694d2fb013f65 OpenBIOS r1056 update.
Regards,
Andreas
--
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg