* Laszlo Ersek (lersek@redhat.com) wrote:
On 01/23/17 16:49, Kevin O'Connor wrote:
On Mon, Jan 23, 2017 at 11:11:02AM +0100, Laszlo Ersek wrote:
On 01/20/17 20:39, Dr. David Alan Gilbert wrote:
- Kevin O'Connor (kevin@koconnor.net) wrote:
On Fri, Jan 20, 2017 at 06:40:44PM +0000, Dr. David Alan Gilbert wrote:
Hi, I turned the debug level up to 4 on our smaller (128k) ROM downstream build and seem to have hit a case where it's been layed out so that the 'ExtraStack' is at the same location as some code (display_uuid) which was causing some very random behaviour;
[...]
Would this be consistent with a stack overflow?
See commit 46b82624c95b951e8825fab117d9352faeae0ec8. Perhaps BUILD_EXTRA_STACK_SIZE (2KB) is too small now?
The ExtraStack isn't used at the point Dave reports the problem - display_uuid() is part of the init phase and that happens on the main "post" stack.
[...]
(This is based off 1.9.1)
I missed that earlier - there were some important fixes post 1.9.1 wrt reboots. Commits b837e68d / a48f602c2 could explain the issue. I'd make sure the issue is still present on the latest version.
That's a very promising hunch -- b837e68d explicitly mentions "reboot loop" in the subject. It seems that Dave didn't mention any RHBZ numbers in his email, but we have two somewhat similar bug reports (which I hope share a root cause) and the second report triggers the issue with a reboot loop specifically.
https://bugzilla.redhat.com/show_bug.cgi?id=1411275 https://bugzilla.redhat.com/show_bug.cgi?id=1382906
(Apologies that the 2nd RHBZ is not public; it's currently filed for the RH kernel, and those BZs default to private. :/)
CC'ing DavidH too, for RHBZ#1382906.
Yeh, it's looking promising; I've done a build with low debug that survived for 50+ reboots and turned my debug on and it's going for 20 so far, so that's pretty good.
However, reading the commits I'm a little confused.
I don't seem to have hit any cases where it's taken the shutdown case after failing to reboot; so it's not that path.
My reboots in this case are always guest triggered, so they're not very early reboots.
One comment in there is: + // Some old versions of KVM don't store a pristine copy of the + // BIOS in high memory. Try to shutdown the machine instead.
do you have a definition of 'old'; in this case it's a new-ish qemu on our downstream (older) kernel but it's got fairly new kvm bits in, but the qemu is configured in our rhel6 compatibility mode - so hmm.
Dave
(With any luck this post will act as a jinx to wake up any sleeping bugs that make me think it's OK....)
Thank you! Laszlo
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK