Hi all,
I've spent some time today looking at the ELF loader code for each different architecture, and with a bit of work I believe that I can modify arch/sparc64/elfload.c so that it can be used across all architectures.
Looking closer at the source, it can be seen that are actually two sets of ELF routines: one in arch/*/elfload.c, and another set in libopenbios/elfload.c which only appears to be used by the PPC code. I've already manage to refactor the SPARC64 code so it reuses (and also exposes) the same API for use with the PPC code, except with one exception: file handles.
In arch/*/elfload.c, the preferred method for using handles is to make use of the included loadfs.c methods which works with an implicit file handle (i.e. the file handle isn't passed into any of the functions). In contrast, the PPC code directly calls the *_io() API (see find_elf()/elf_readhdrs()) and so requires an explicit file handle to be passed into each function.
I'm currently leaning towards the PPC method, since loadfs.c seems to be such a lightweight wrapper that it hardly seems worth keeping. Can anyone see a reason to keep this before I go and rip it out?
ATB,
Mark.
On 3/20/10, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Hi all,
I've spent some time today looking at the ELF loader code for each different architecture, and with a bit of work I believe that I can modify arch/sparc64/elfload.c so that it can be used across all architectures.
Looking closer at the source, it can be seen that are actually two sets of ELF routines: one in arch/*/elfload.c, and another set in libopenbios/elfload.c which only appears to be used by the PPC code. I've already manage to refactor the SPARC64 code so it reuses (and also exposes) the same API for use with the PPC code, except with one exception: file handles.
In arch/*/elfload.c, the preferred method for using handles is to make use of the included loadfs.c methods which works with an implicit file handle (i.e. the file handle isn't passed into any of the functions). In contrast, the PPC code directly calls the *_io() API (see find_elf()/elf_readhdrs()) and so requires an explicit file handle to be passed into each function.
I'm currently leaning towards the PPC method, since loadfs.c seems to be such a lightweight wrapper that it hardly seems worth keeping. Can anyone see a reason to keep this before I go and rip it out?
I think ideally the internal APIs should be identical to POSIX ones (open, lseek, read/write, close). So this would be a step into right direction.
Blue Swirl wrote:
I'm currently leaning towards the PPC method, since loadfs.c seems to be such a lightweight wrapper that it hardly seems worth keeping. Can anyone see a reason to keep this before I go and rip it out?
I think ideally the internal APIs should be identical to POSIX ones (open, lseek, read/write, close). So this would be a step into right direction.
Okay - in that case I'll rip it out. Just as a side question, what could I use to test that I haven't broken the ELF code in QEMU? I'm guessing I could test using something like the FC12 bootable CD for each platform (x86, PPC and SPARC64) to check?
ATB,
Mark.
On 3/20/10, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Blue Swirl wrote:
I'm currently leaning towards the PPC method, since loadfs.c seems to
be
such a lightweight wrapper that it hardly seems worth keeping. Can
anyone
see a reason to keep this before I go and rip it out?
I think ideally the internal APIs should be identical to POSIX ones (open, lseek, read/write, close). So this would be a step into right direction.
Okay - in that case I'll rip it out. Just as a side question, what could I use to test that I haven't broken the ELF code in QEMU? I'm guessing I could test using something like the FC12 bootable CD for each platform (x86, PPC and SPARC64) to check?
I think SILO and BSD loaders are still using a.out and Solaris uses FCode.
On 2010-3-20 3:03 PM, Blue Swirl wrote:
[...]
Okay - in that case I'll rip it out. Just as a side question, what could I use to test that I haven't broken the ELF code in QEMU? I'm guessing I could test using something like the FC12 bootable CD for each platform (x86, PPC and SPARC64) to check?
I think SILO and BSD loaders are still using a.out and Solaris uses FCode.
Solaris uses FCode for the primary booter (blocks 1-15), and ELF for the secondary booter (/platform/xxx file read in by primary booter). Both call back into Openboot code for I/O, until reaching "lights out", when Solaris takes over the trap table and runs its own drivers.
Blue Swirl wrote:
Okay - in that case I'll rip it out. Just as a side question, what could I use to test that I haven't broken the ELF code in QEMU? I'm guessing I could test using something like the FC12 bootable CD for each platform (x86, PPC and SPARC64) to check?
I think SILO and BSD loaders are still using a.out and Solaris uses FCode.
Right - I now have an experimental patch available which moves all of the loaders except linux_load.c into libopenbios/ and refactors everything else accordingly. I suspect if other people were to help out, we could do a similar thing with the Linux loader too as the overall code looks similar.
The good parts of the patch are that it consolidates a lot of code that was split across multiple arch/ directories into one place and so a lot of duplicate code can be ripped out. This also has the advantage of being able to offer several loaders across multiple architectures very easily.
The bad news is that while I can compile across the entire range of platforms, there are still some remaining issues. For example, my SPARC64 Milax ISO test now crashes which will require some digging. However, at the moment it's quite a large patch and part of me feels I should commit what I've got so that other people can get eyeballs on this and help out.
Anyone feel strongly either way? I've attached what I have so far.
Mark.
Hello,
Am 22.03.2010 um 13:15 schrieb Mark Cave-Ayland:
my SPARC64 Milax ISO test now crashes [...]. However, [...] part of me feels I should commit what I've got so that other people can get eyeballs on this and help out.
Please don't commit known-bad refactoring steps, that's really bad for bisecting. If you want more review, consider pushing a Git branch somewhere. Note that most patches sent to this list are not git-am'able so need special care when applying, like manually undoing quoted-printable...
Andreas
Andreas Färber wrote:
my SPARC64 Milax ISO test now crashes [...]. However, [...] part of me feels I should commit what I've got so that other people can get eyeballs on this and help out.
Please don't commit known-bad refactoring steps, that's really bad for bisecting. If you want more review, consider pushing a Git branch somewhere. Note that most patches sent to this list are not git-am'able so need special care when applying, like manually undoing quoted-printable...
Thanks for the response Andreas.
I'm very happy that the patch is the way forward given the current diversity within the OpenBIOS codebase. I'm just nervous that a longer timeframe means that other patches will causing merge difficulties in the future for what is already quite a large patch (see attachment with my last email).
I think the best that I can do so far is to commit when it passes my tests which are primarily on SPARC64. In terms of other platforms, I've asked more than once how I can test them before commit but no-one has offered any responses. Then again, as a result of the new refactoring I would expect fixes to be a lot simpler.
Maybe it would make more sense for OpenBIOS to move over to git for development testing - but then again, commits of this size will undoubtedly be quite rare. What does everyone else think?
ATB,
Mark.
On 3/23/10, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Andreas Färber wrote:
my SPARC64 Milax ISO test now crashes [...]. However, [...] part of me
feels I should commit what I've got so that other people can get eyeballs on this and help out.
Please don't commit known-bad refactoring steps, that's really bad for
bisecting.
If you want more review, consider pushing a Git branch somewhere. Note
that most patches sent to this list are not git-am'able so need special care when applying, like manually undoing quoted-printable...
Thanks for the response Andreas.
I'm very happy that the patch is the way forward given the current diversity within the OpenBIOS codebase. I'm just nervous that a longer timeframe means that other patches will causing merge difficulties in the future for what is already quite a large patch (see attachment with my last email).
I think the best that I can do so far is to commit when it passes my tests which are primarily on SPARC64. In terms of other platforms, I've asked more than once how I can test them before commit but no-one has offered any responses. Then again, as a result of the new refactoring I would expect fixes to be a lot simpler.
I don't think I have any ELF test cases, unless Solaris happens to use our ELF loader during the second stage.
Maybe it would make more sense for OpenBIOS to move over to git for development testing - but then again, commits of this size will undoubtedly be quite rare. What does everyone else think?
I like git, though with git-svn I can work with my tree with StGIT, QGit and git tools as if the main repository were also git.
On 3/22/10, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Blue Swirl wrote:
Okay - in that case I'll rip it out. Just as a side question, what
could I
use to test that I haven't broken the ELF code in QEMU? I'm guessing I
could
test using something like the FC12 bootable CD for each platform (x86,
PPC
and SPARC64) to check?
I think SILO and BSD loaders are still using a.out and Solaris uses FCode.
Right - I now have an experimental patch available which moves all of the loaders except linux_load.c into libopenbios/ and refactors everything else accordingly. I suspect if other people were to help out, we could do a similar thing with the Linux loader too as the overall code looks similar.
The good parts of the patch are that it consolidates a lot of code that was split across multiple arch/ directories into one place and so a lot of duplicate code can be ripped out. This also has the advantage of being able to offer several loaders across multiple architectures very easily.
The bad news is that while I can compile across the entire range of platforms, there are still some remaining issues. For example, my SPARC64 Milax ISO test now crashes which will require some digging. However, at the moment it's quite a large patch and part of me feels I should commit what I've got so that other people can get eyeballs on this and help out.
Anyone feel strongly either way? I've attached what I have so far.
Bad commits aren't very nice. Perhaps you should split the patche into more digestable pieces? For example the implicit vs. explicit file descriptor change should be safe.
Blue Swirl wrote:
Bad commits aren't very nice. Perhaps you should split the patche into more digestable pieces? For example the implicit vs. explicit file descriptor change should be safe.
Hmmm that could be possible, although in order to commit a compilable build, it means making similar changes all over the arch/ tree before then moving just one copy into libopenbios. So that's a lot of extra work just to then rip it out in a subsequent commit :( I think I'd much rather try and figure out why my latest SVN development code is failing in the Fcode loader if at all possible.
Incidentally, just to check I pulled out a fresh r684 from SVN which is the last commit before I started work on the refactoring. I then did a complete build check to see what I could compile. I then repeated the test with current SVN to ensure I hadn't broken anything during the first stage.
For those who are interested, the current SVN build status for me is given below (using cross-compiling where appropriate on x86 Linux):
arch/amd64 : builds openbios-unix only arch/ppc/briq : fails to compile with "Error -34 occured." error arch/ppc/mol : fails to compile with "undefined word" error arch/ppc/pearpc : fails to compile with "Error -34 occured." error arch/ppc/qemu : builds ok arch/sparc32 : builds ok arch/sparc64 : builds ok arch/x86 : builds ok
So in theory, I should be most of the way there by having a compilable build for the 4 successful platforms above. PPC does look quite broken at the moment though :( Is there a current maintainer for the PPC code?
ATB,
Mark.
On 3/23/10, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Blue Swirl wrote:
Bad commits aren't very nice. Perhaps you should split the patche into more digestable pieces? For example the implicit vs. explicit file descriptor change should be safe.
Hmmm that could be possible, although in order to commit a compilable build, it means making similar changes all over the arch/ tree before then moving just one copy into libopenbios. So that's a lot of extra work just to then rip it out in a subsequent commit :( I think I'd much rather try and figure out why my latest SVN development code is failing in the Fcode loader if at all possible.
How about the other way round then, make the change after merge?
Incidentally, just to check I pulled out a fresh r684 from SVN which is the last commit before I started work on the refactoring. I then did a complete build check to see what I could compile. I then repeated the test with current SVN to ensure I hadn't broken anything during the first stage.
For those who are interested, the current SVN build status for me is given below (using cross-compiling where appropriate on x86 Linux):
arch/amd64 : builds openbios-unix only arch/ppc/briq : fails to compile with "Error -34 occured." error arch/ppc/mol : fails to compile with "undefined word" error arch/ppc/pearpc : fails to compile with "Error -34 occured." error arch/ppc/qemu : builds ok arch/sparc32 : builds ok arch/sparc64 : builds ok arch/x86 : builds ok
So in theory, I should be most of the way there by having a compilable build for the 4 successful platforms above. PPC does look quite broken at the moment though :( Is there a current maintainer for the PPC code?
Laurent used to do a lot of PPC stuff, but he hasn't been active for a while. Didn't Alexander also get commit access?
I test ppc/qemu every now and then, but I don't know how to test briq, mol or pearpc. Also for x86 and amd64 I only test unix build.
Blue Swirl wrote:
Hmmm that could be possible, although in order to commit a compilable build, it means making similar changes all over the arch/ tree before then moving just one copy into libopenbios. So that's a lot of extra work just to then rip it out in a subsequent commit :( I think I'd much rather try and figure out why my latest SVN development code is failing in the Fcode loader if at all possible.
How about the other way round then, make the change after merge?
Again, there is a similar situation here. In order to bring the loaders into libopenbios, I had to implement the concept of "saved-program-state" from the OF spec. Hence all off the loaders (and also platform-boot) need to be modified to take this into account :(
For those who are interested, the current SVN build status for me is given below (using cross-compiling where appropriate on x86 Linux):
arch/amd64 : builds openbios-unix only arch/ppc/briq : fails to compile with "Error -34 occured." error arch/ppc/mol : fails to compile with "undefined word" error arch/ppc/pearpc : fails to compile with "Error -34 occured." error arch/ppc/qemu : builds ok arch/sparc32 : builds ok arch/sparc64 : builds ok arch/x86 : builds ok
So in theory, I should be most of the way there by having a compilable build for the 4 successful platforms above. PPC does look quite broken at the moment though :( Is there a current maintainer for the PPC code?
Laurent used to do a lot of PPC stuff, but he hasn't been active for a while. Didn't Alexander also get commit access?
I test ppc/qemu every now and then, but I don't know how to test briq, mol or pearpc. Also for x86 and amd64 I only test unix build.
Once of the things that has bugged me for a while has been that forthstrap is not particularly helpful when an error occurs. I've added some code to help with this, so now forthstrap will report a file name and line number if an undefined word is used or a runtime word causes a Forth error.
Similary, based on my mini-PPC fault-finding experience, I altered the exception handler routine to be a function pointer to allow the handler to be changed during bootstrap. This also now allows us to indicate whether the error is caused by executing source directly or by executing an existing base dictionary.
Now when I try and build PPC pearpc I get this:
Configuring OpenBIOS on amd64 for cross-ppc Initializing build tree obj-ppc...ok. Creating target Makefile...ok. Creating config files...ok. Building OpenBIOS for ppc Building...error: make[1]: Entering directory `/home/build/src/openbios/openbios-devel.pre/obj-ppc' GEN target/include/openbios-version.h GEN forth/version.fs HOSTCC host/kernel/bootstrap.o HOSTCC host/kernel/dict.o HOSTCC host/kernel/primitives.o HOSTCC host/kernel/stack.o HOSTCC forthstrap GEN bootstrap.dict GEN openbios.dict GEN openbios-pearpc.dict Error executing base dictionary ./openbios.dict: error -34 occured. make[1]: *** [openbios-pearpc.dict] Error 1 make[1]: Leaving directory `/home/build/src/openbios/openbios-devel.pre/obj-ppc' make: *** [build] Error 1
So it looks like the issue is with something trying to access an unknown device in one of the initialisers invoked in openbios.dict. At some point I should probably add a mechanism to allow a Forth throw to set an error string which can be retrieved by the error handler - but that can wait a little while longer.
ATB,
Mark.