Years ago when C was young and implementations were not readily available people compilers for subsets of C and called then small C compilers.
Currently one of the greatest challenges in LinuxBIOS is to write ram initialization code. On x86 this is all done in 8 general purpose registers. And it takes about a 1000 lines of assembly per memory controller, to do a good job and handle all of the auto-configuration cases. Debugging the assembly is bad as frequently adding the debug code breaks the code we are trying to debug. Additionally there is no code reuse between the two binaries that make up LinuxBIOS as one is in C and the other in assembly.
By this point I have enough practice I am good at writing the assembly code, that I think it hardly matters. But I have seen several proficient coders just grind to a halt when confronted with miles of assembly.
So what I propose is to write a small C compiler for LinuxBIOS. Various reports from ages past put it at about a man-month of effort to get the first version going. And it may be better than that as there are a couple of places I can copy code from. tcc, lcc, gcc, and other small c projects.
Up until this point all C compilers have had an assumption that you can have a stack. And most simple C compilers have kept none of the data structures needed to do inline and similar optimizations that are necessary when you don't have ram.
So I think it is easiest to build up something very simple from scratch. That is the route I am going to try. No promises, but I should start a trial implementation soon.
Ron. I am going to need to do a branch soon as I do the Hammer port and integrate a small C compiler. And I want to make some clean fixes and remove some of the backwards compatibility cruft, and the linuxbios 1.0 code base is an inappropriate place to target. It should be possible to roll in most of the improvements into the 1.0 codebase, but that will just increase the clutter of the codebase.
Eric
Would it be easier to design this into gcc as a separate architecture or would that be too unmanagable? It may or may not be of benefit to other embedded projects as well so there may be more help keeping it up to date if it was in the gcc tree. Then again, someone could modify it for their own needs and make it unusable for what it was intended.
GO
On Thursday 13 February 2003 00:29, Eric W. Biederman wrote:
Years ago when C was young and implementations were not readily available people compilers for subsets of C and called then small C compilers.
Currently one of the greatest challenges in LinuxBIOS is to write ram initialization code. On x86 this is all done in 8 general purpose registers. And it takes about a 1000 lines of assembly per memory controller, to do a good job and handle all of the auto-configuration cases. Debugging the assembly is bad as frequently adding the debug code breaks the code we are trying to debug. Additionally there is no code reuse between the two binaries that make up LinuxBIOS as one is in C and the other in assembly.
By this point I have enough practice I am good at writing the assembly code, that I think it hardly matters. But I have seen several proficient coders just grind to a halt when confronted with miles of assembly.
So what I propose is to write a small C compiler for LinuxBIOS. Various reports from ages past put it at about a man-month of effort to get the first version going. And it may be better than that as there are a couple of places I can copy code from. tcc, lcc, gcc, and other small c projects.
Up until this point all C compilers have had an assumption that you can have a stack. And most simple C compilers have kept none of the data structures needed to do inline and similar optimizations that are necessary when you don't have ram.
So I think it is easiest to build up something very simple from scratch. That is the route I am going to try. No promises, but I should start a trial implementation soon.
Ron. I am going to need to do a branch soon as I do the Hammer port and integrate a small C compiler. And I want to make some clean fixes and remove some of the backwards compatibility cruft, and the linuxbios 1.0 code base is an inappropriate place to target. It should be possible to roll in most of the improvements into the 1.0 codebase, but that will just increase the clutter of the codebase.
Eric
Linuxbios mailing list Linuxbios@clustermatic.org http://www.clustermatic.org/mailman/listinfo/linuxbios
GNUOrder gnuorder@tampabay.rr.com writes:
Would it be easier to design this into gcc as a separate architecture or would that be too unmanagable? It may or may not be of benefit to other embedded projects as well so there may be more help keeping it up to date if it was in the gcc tree. Then again, someone could modify it for their own needs and make it unusable for what it was intended.
Beats me.. gcc is the one compiler of the free software compilers it might be possible to integrate with. I think it is a real pain to start with.
If someone knows more about gcc feel free to go ahead and try.
Mostly I am avoiding that route in the interests of simplicity and comprehensibility. Someone familiar with gcc might have a different take on the situation.
As things get a little farther I hope to publish the subset of C I plan on using, which can aid other attempts.
Eric
GNUOrder gnuorder@tampabay.rr.com writes:
Would it be easier to design this into gcc as a separate architecture or would that be too unmanagable?
Beats me.. gcc is the one compiler of the free software compilers it might be possible to integrate with. I think it is a real pain to start with.
Eric
Will you be able to use the gnu pre-processor unchanged? or adapt it? The macro expansion seems pretty important.
-Steve
"Steve M. Gehlbach" steve@nexpath.com writes:
GNUOrder gnuorder@tampabay.rr.com writes:
Would it be easier to design this into gcc as a separate architecture or would that be too unmanagable?
Beats me.. gcc is the one compiler of the free software compilers it might be possible to integrate with. I think it is a real pain to start with.
Eric
Will you be able to use the gnu pre-processor unchanged? or adapt it? The macro expansion seems pretty important.
It can used unchanged, though I don't know if that is a reason to use it. Having a C preprocessor is definitely a requirement of the LinuxBIOS small C.
To have an idea how small a fairly complete C compiler can get, lookup tcc. It can very nearly compile the kernel and the source is only 200K or so. It currently does not optimize but...
Eric
On Thu, 2003-02-13 at 14:11, Steve M. Gehlbach wrote:
Will you be able to use the gnu pre-processor unchanged? or adapt it? The macro expansion seems pretty important.
FYI, I was reading about newer GCC (3.2?) that have merged the preprocessor into the main parser.
On 19 Feb 2003, Jeremy Jackson wrote:
On Thu, 2003-02-13 at 14:11, Steve M. Gehlbach wrote:
Will you be able to use the gnu pre-processor unchanged? or adapt it? The macro expansion seems pretty important.
FYI, I was reading about newer GCC (3.2?) that have merged the preprocessor into the main parser.
I have been on the hunt for small c-like compilers. I have yet to find one that runs in the registers only, i.e. has an addressable memory of 16 words.
My concern about a full-blown c compiler is this: we are going to move from debugging 1000 or so lines of assembly to debugging the compiler, and shipping a full compiler with linuxbios, just to eliminate this 1000 or so lines of assembly. It seems hard to justify. Since we will be the probable only users of this compiler the support burden will fall on us. There are not that many people out there needing a compiler that does this "your memory is only your register set" capability.
Is there another way? Could we, for example, build a tool that would take a description of the actions for turning on memory and generate the code? This would be a specialized "little language". I'm looking for those too -- sort of a "meta assembler".
I once wrote an OS using a set of "algol-like" assembler macros. It wasn't perfect but the job of writing the OS was considerably reduced. Should we do this?
I think we would all like something better than assembly for the hard memory turn-on step, I am just not sure it is a C compiler.
thanks
ron
"Ronald G. Minnich" rminnich@lanl.gov writes:
On 19 Feb 2003, Jeremy Jackson wrote:
On Thu, 2003-02-13 at 14:11, Steve M. Gehlbach wrote:
Will you be able to use the gnu pre-processor unchanged? or adapt it? The macro expansion seems pretty important.
FYI, I was reading about newer GCC (3.2?) that have merged the preprocessor into the main parser.
I have been on the hunt for small c-like compilers. I have yet to find one that runs in the registers only, i.e. has an addressable memory of 16 words.
My concern about a full-blown c compiler is this: we are going to move from debugging 1000 or so lines of assembly to debugging the compiler, and shipping a full compiler with linuxbios, just to eliminate this 1000 or so lines of assembly.
To shipping a 1000 or so lines of C. And it is not a full compiler for all of C just an appropriate subset. And the output will be assembler.
It seems hard to justify. Since we will be the probable only users of this compiler the support burden will fall on us. There are not that many people out there needing a compiler that does this "your memory is only your register set" capability.
Nope.
Is there another way? Could we, for example, build a tool that would take a description of the actions for turning on memory and generate the code? This would be a specialized "little language". I'm looking for those too -- sort of a "meta assembler".
Personally I think it is harder to write a compiler for a declarative language instead of an imperative language like C.
I once wrote an OS using a set of "algol-like" assembler macros. It wasn't perfect but the job of writing the OS was considerably reduced. Should we do this?
That is essentially what I am suggesting.
I think we would all like something better than assembly for the hard memory turn-on step, I am just not sure it is a C compiler.
Personally I don't think it will be that bad. Non optimizing compilers are straight forward and pretty easy to get right. The basic trick is not to let the task be overwhelming.
The largest benefit comes from supporting a subset of C, as that allows code sharing. I currently have a month in my schedule to investigate that. If it looks to hard I will back off but I intend to give it a good effort.
To my mind this problem is a lot like supporting printf. A full printf is scary and huge. And small subset of printf like we have now is not to bad and can easily maintained. And it can be easily extended to add missing printf features if needed.
In school I majored in languages and I wrote a C interpreter in a semester, and if I had another week or two it would have been a compiler as well. The nastiest most distracting part of a compiler is the parser, and that we can borrow from elsewhere. My intention is to use a recursive descent parser, so the compiler can be easily tweaked.
And I guess I do not see the maintenance being a major issue. Small simple focus programs that don't handle the general case tend to be easier to write, and maintain.
Eric
Below is a trivial C program with a function call and a few loops. When compiled with gcc using the stated flags, it does not use any stack. The assembler output is below.
/* sample program demonstrating trivial C program */ /* that doesn't use stack when compiled with gcc */
/* compile with gcc -O -fverbose-asm -fomit-frame-pointer -S -Winline */ /* -O optimization is required to honour inline keyword */ /* __attribute__((always_inline)); could be used if opt is bad */ /* -fverbose-asm helps when choosing compiler options */ /* -fomit-frame-pointer eliminates function prologue */ /* -S generates assembly, .c file becomes .s so you can inspect */ /* -Winline explains why if something can't be inlined */
extern volatile int r; /* example MMIO register, could be an io insn also */
static inline int afunc(const int x) { register int y; for (y=0; y<x; y++) { r=y; } }
int main () { register int i,j;
for (i=0; i<255; j+=i++) { afunc(j); } return(j); }
----------------------------------- .file "test.c" .version "01.01" # GNU C version 2.95.4 20011002 (Debian prerelease) (i386-linux) compiled by GNU C version 2.95.4 20011002 (Debian prerelease). # options passed: -O -Winline -fomit-frame-pointer -fverbose-asm # options enabled: -fdefer-pop -fomit-frame-pointer -fthread-jumps # -fpeephole -ffunction-cse -finline -fkeep-static-consts # -fpcc-struct-return -fcommon -fverbose-asm -fgnu-linker -fargument-alias # -fident -m80387 -mhard-float -mno-soft-float -mieee-fp -mfp-ret-in-387 # -mschedule-prologue -mcpu=i386 -march=i386
gcc2_compiled.: .text .align 4 .globl main .type main,@function main: xorl %ecx,%ecx .p2align 4,,7 .L12: xorl %edx,%edx cmpl %eax,%edx jge .L11 .p2align 4,,7 .L15: movl %edx,r incl %edx cmpl %eax,%edx jl .L15 .L11: addl %ecx,%eax incl %ecx cmpl $254,%ecx jle .L12 ret .Lfe1: .size main,.Lfe1-main .ident "GCC: (GNU) 2.95.4 20011002 (Debian prerelease)" ---------------------------
On Wed, 2003-02-19 at 10:42, Ronald G. Minnich wrote:
I have been on the hunt for small c-like compilers. I have yet to find one that runs in the registers only, i.e. has an addressable memory of 16 words.
My concern about a full-blown c compiler is this: we are going to move from debugging 1000 or so lines of assembly to debugging the compiler, and shipping a full compiler with linuxbios, just to eliminate this 1000 or so lines of assembly. It seems hard to justify. Since we will be the probable only users of this compiler the support burden will fall on us. There are not that many people out there needing a compiler that does this "your memory is only your register set" capability.
On 25 Feb 2003, Jeremy Jackson wrote:
Below is a trivial C program with a function call and a few loops. When compiled with gcc using the stated flags, it does not use any stack. The assembler output is below.
neat!
I wonder if we'll be able to work with the gcc community somehow to get what Eric needs with a standard gcc.
ron
"Ronald G. Minnich" rminnich@lanl.gov writes:
On 25 Feb 2003, Jeremy Jackson wrote:
Below is a trivial C program with a function call and a few loops. When compiled with gcc using the stated flags, it does not use any stack. The assembler output is below.
neat!
I wonder if we'll be able to work with the gcc community somehow to get what Eric needs with a standard gcc.
I suspect a careful gcc port of gcc to x86-noram can handle it, and long term that is probably where the effort needs to go.
Short term I suspect there would be a lot of distraction with just the mechanics of gcc. If someone knows gcc better than me feel free.
For the first proof of concept pass I am going to tackle something simple and stand alone. When that works we will have something to use for the short term, and something we can point at and say hey it works. What will it take to get gcc to do something similar.
One of the very nice things about C is that practically every optimization can be done by hand. So a C compiler really only needs to do register allocation, instruction selection, and instruction scheduling. Which means you really don't need a compiler that is especially smart.
Anyway we will see in a little bit what makes the most sense.
Eric
Eric W. Biederman wrote:
Short term I suspect there would be a lot of distraction with just the mechanics of gcc. If someone knows gcc better than me feel free.
Eric
I think you have to cramp your C coding style anyway, to stay within registers. The extra scratch area does not help much, with chips such as the SiS630, only has three gp regs, little help. You can't go around declaring variables willy nilly, you run out of space (registers) no matter what compiler.
I also have been experimenting with inline gcc, and I think it works pretty well and once you get the hang of it, and it _is_ easier than assy. I have re-coded the console routines, and ram setup, and spd timing setup on the sis630 for C. It is still a wip but I have tested the spd setup by wrapping a real main on a live machine.
So far it is about 400 lines of C, should I attach it? It compiles without using the stack (except for a %ebp push/pop which can be deleted).
I can't see how a special compiler gets you enough more than gcc to be worth the downside of effort and debugging. Scratch regs vary from chip to chip, and use 1.5 regs to access (pci). You could use the %ebp and %esp which is a gain of two (and maybe %es %fs etc), but I think using gcc will get us there anyway.
Just another opinion to put in the hat.
-Steve
I would like to see that code, thanks
ron
Ronald G. Minnich wrote:
I would like to see that code, thanks
ron
The original message is below since I think it did not post to the list due to my return email address (fixed that I think). The code is attached (6 files).
------
I think you have to cramp your C coding style anyway, to stay within registers. The extra scratch area does not help much, with chips such as the SiS630, only has three gp regs, little help. You can't go around declaring variables willy nilly, you run out of space (registers) no matter what compiler.
I also have been experimenting with inline gcc, and I think it works pretty well and once you get the hang of it, and it _is_ easier than assy. I have re-coded the console routines, and ram setup, and spd timing setup on the sis630 for C. It is still a wip but I have tested the spd setup by wrapping a real main on a live machine.
So far it is about 400 lines of C, should I attach it? It compiles without using the stack (except for a %ebp push/pop which can be deleted).
I can't see how a special compiler gets you enough more than gcc to be worth the downside of effort and debugging. Scratch regs vary from chip to chip, and use 1.5 regs to access (pci). You could use the %ebp and %esp which is a gain of two (and maybe %es %fs etc), but I think using gcc will get us there anyway.
Just another opinion to put in the hat.
-Steve
// // by S. Gehlbach <steve @ kesa . com> // // edit with tab stops =4 // // for SiS630 spd interface // #ifndef COMMON #include "common.h" #endif
#define SMA_SIZE 0x80 #define ACPI_BASE_ADDR 0x5000 #define SMB_BASE_ADDR ACPI_BASE_ADDR
/* Define register offsets */ #define SMB_STS 0x80 #define SMB_ENB 0x81 #define SMB_CNT 0x82 #define SMBHOST_CNT 0x83 #define SMB_ADDR 0x84 #define SMB_CMD 0x85 #define SMB_PCOUNT 0x86 #define SMB_COUNT 0x87 #define SMB_DATA 0x88
/* Define register settings */ #define SMB_KILL 0x20 #define DIMM_BASE 0xa1 // 1010001 is base for DIMM in SMBus #define READ_CMD 0x12 // start + R/W in CMD reg
/* Define SPD Data locations */ #define MEM_TYPE 2 // Memory Type - EDO, FPM, SDRAM #define NUM_ROWS 3 // Number of Row Addresses #define NUM_COLS 4 // Number of Column Addresses #define NUM_MOD_ROWS 5 // Number of Module Rows #define NUM_BANKS 17 // number of banks #define CAS_LAT 18 // CAS Latencies Supported #define MOD_ATTR 21 // SDRAM Module Attributes #define BANK_DENSITY 31 // Module Bank Density #define SPD_REV 62 // SPD Revision #define SPEC_FREQ 126 // Specification Frequency - 66, 100, ...
// one bank const char sdram_type_bank_1 [] = { // Column Number 8 9 10 11 Row Number 0x00, 0x04, 0x08, 0xff, // 11 0xff, 0xff, 0xff, 0xff, // 12 0x01, 0x05, 0x09, 0x0d // 13 };
// two banks const char sdram_type_bank_2 [] = { // Column Number 8 9 10 11 Row Number 0xc0, 0xff, 0xff, 0xff, // 11 0x02, 0x06, 0x0a, 0x0e, // 12 0x03, 0x07, 0x0b, 0x0f // 13 }; extern inline unsigned char read_spd(unsigned char byte_number) { outb(0xff,SMB_BASE_ADDR + SMB_STS); // clear status outb(byte_number, SMB_BASE_ADDR + SMB_CMD); // SMBus command outb(READ_CMD, SMB_BASE_ADDR + SMBHOST_CNT); // SMBus Host Control, Start, R/W while ( !(0x0a & inb(SMB_BASE_ADDR + SMB_STS)) ) ; // loop on status for error or done return inb(SMB_BASE_ADDR + SMB_DATA); }
extern inline set_ram (unsigned char slot) { // issue smbus kill command outb(SMB_KILL, SMB_BASE_ADDR + SMBHOST_CNT); // SMBus host control outb(0xff,SMB_BASE_ADDR + SMB_STS); // clear status
// address the bus // SPD is SMBus address 1010 xxxR where R=1 for read and xxx is dimm slot number outb(0xa1 + 2*slot,SMB_BASE_ADDR + SMB_ADDR); // SMBus Address
// read ram type (SPD byte 2) // skip if not SDRAM (type=4) if ( read_spd(MEM_TYPE) != 0x04 ) return;
// above should fail on non-ex simm but just to be sure... // check the status if (inb(SMB_BASE_ADDR + SMB_STS) != 0x08) return;
// need to ck and see if 0xff switch (read_spd(NUM_ROWS)) { case 11: case 12: case 13: break; default: // not in range, return return; } switch (read_spd(NUM_COLS)) { case 8: case 9: case 10: case 11: break; default: // not in range, return return; }
// // note we read the spd more than once but we are using it as our scratch area // since we have no scratch space just registers. // some table values are out of range but we will check them later and not enable ram. // switch (read_spd(NUM_BANKS)) {
case 1: pci_conf1_write_config_byte( sdram_type_bank_1[read_spd(NUM_ROWS)*4 + read_spd(NUM_COLS) - 52], CONFIG_CMD(NORTH_BRIDGE_BUS,PCI_DEVFN(NORTH_BRIDGE_DEV,NORTH_BRIDGE_FUN),0x60+slot) ); break; default: pci_conf1_write_config_byte( sdram_type_bank_2[read_spd(NUM_ROWS)*4 + read_spd(NUM_COLS) - 52], CONFIG_CMD(NORTH_BRIDGE_BUS,PCI_DEVFN(NORTH_BRIDGE_DEV,NORTH_BRIDGE_FUN),0x60+slot) ); }
// // check illegal values // return on error does not enable dimm so effectively it is disabled // switch (pci_conf1_read_config_byte(CONFIG_CMD(NORTH_BRIDGE_BUS,PCI_DEVFN(NORTH_BRIDGE_DEV,NORTH_BRIDGE_FUN),0x60+slot)) & 0x80) { case 0x80: return; default: }
// set module rows switch (read_spd(NUM_MOD_ROWS)) { case 2: pci_conf1_write_config_byte( 1<<5 | pci_conf1_read_config_byte(CONFIG_CMD(NORTH_BRIDGE_BUS,PCI_DEVFN(NORTH_BRIDGE_DEV,NORTH_BRIDGE_FUN),0x60+slot)), CONFIG_CMD(NORTH_BRIDGE_BUS,PCI_DEVFN(NORTH_BRIDGE_DEV,NORTH_BRIDGE_FUN),0x60+slot) ); } // // everything okay so set the enable bit // pci_conf1_write_config_byte( (1<<slot) | pci_conf1_read_config_byte(CONFIG_CMD(NORTH_BRIDGE_BUS,PCI_DEVFN(NORTH_BRIDGE_DEV,NORTH_BRIDGE_FUN),0x63)), CONFIG_CMD(NORTH_BRIDGE_BUS,PCI_DEVFN(NORTH_BRIDGE_DEV,NORTH_BRIDGE_FUN),0x63) ); }
extern inline void setup_chip () { // ACPI Enable pci_conf1_write_config_byte( 0x8a, CONFIG_CMD(LPC_BRIDGE_BUS,PCI_DEVFN(LPC_BRIDGE_DEV,LPC_BRIDGE_FUN),0x40) ); // Set ACPI Base I/O Address pci_conf1_write_config_byte( ACPI_BASE_ADDR>>16, CONFIG_CMD(LPC_BRIDGE_BUS,PCI_DEVFN(LPC_BRIDGE_DEV,LPC_BRIDGE_FUN),0x75) ); // MDOE# enable, this bit should be set before sizing pci_conf1_write_config_byte( 0x01, CONFIG_CMD(LPC_BRIDGE_BUS,PCI_DEVFN(LPC_BRIDGE_DEV,LPC_BRIDGE_FUN),0x55) ); // initialize the dimm register pci_conf1_write_config_byte(SMA_SIZE, CONFIG_CMD(NORTH_BRIDGE_BUS,PCI_DEVFN(NORTH_BRIDGE_DEV,NORTH_BRIDGE_FUN),0x63)); // check each dimm slot set_ram(0); set_ram(1); set_ram(2); }
// // by S. Gehlbach // edit with :set ts=4 // common routines that can be inlined // // WARNING: all value -> register & pci routines have been coded // to standard AT&T direction, ie, out(value,reg) etc. // #define COMMON
#define PCI_DEVFN(slot,func) ((((slot) & 0x1f) << 3) | ((func) & 0x07)) #define PCI_SLOT(devfn) (((devfn) >> 3) & 0x1f) #define PCI_FUNC(devfn) ((devfn) & 0x07)
#define NORTH_BRIDGE_BUS 0 #define NORTH_BRIDGE_DEV 0 #define NORTH_BRIDGE_FUN 0 #define NORTH_BRIDGE_DEVFN PCI_DEVFN(NORTH_BRIDGE_DEV,NORTH_BRIDGE_FUN);
#define LPC_BRIDGE_BUS 0 #define LPC_BRIDGE_DEV 1 #define LPC_BRIDGE_FUN 0 #define LPC_BRIDGE_DEVFN PCI_DEVFN(LPC_BRIDGE_DEV,LPC_BRIDGE_FUN);
#define PCI_PCI_BRIDGE_BUS 0 #define PCI_PCI_BRIDGE_DEV 2 #define PCI_PCI_BRIDGE_FUN 0 #define PCI_PCI_BRIDGE_DEVFN PCI_DEVFN(PCI_PCI_BRIDGE_DEV,PCI_PCI_BRIDGE_FUN);
#define CONFIG_CMD(bus,devfn, where) (0x80000000 | (bus << 16) | (devfn << 8) | where )
extern inline void outb (unsigned char value, unsigned short port) { asm volatile ("outb %b0,%w1": :"a" (value), "Nd" (port)); }
extern inline void outl (unsigned int value, unsigned short port) { asm volatile ("outl %0,%w1": :"a" (value), "Nd" (port)); }
extern inline unsigned char inb (unsigned short port) { unsigned char _v asm ("al"); asm volatile ("inb %w1,%0":"=a" (_v):"Nd" (port)); return _v; }
static inline void pci_conf1_write_config_byte(unsigned char value, int addr) { outl(addr & ~3, 0xCF8); outb(value, 0xCFC + (addr & 3)); }
static inline unsigned char pci_conf1_read_config_byte(int addr) { outl(addr & ~3, 0xCF8); return inb(0xCFC + (addr & 3)); }
// // by. S. Gehlbach // console routines // edit with :set ts=4 // #define asm __asm__ #define volatile __volatile__ #define inline __inline__
/* Base Address */ #ifndef TTYS0_BASE #define TTYS0_BASE 0x3f8 #endif
#ifndef TTYS0_BAUD #define TTYS0_BAUD 115200 #endif
#if ((115200%TTYS0_BAUD) != 0) #error Bad ttys0 baud rate #endif
#define TTYS0_DIV (115200/TTYS0_BAUD)
/* Line Control Settings */ #ifndef TTYS0_LCS /* Set 8bit, 1 stop bit, no parity */ #define TTYS0_LCS 0x3 #endif
#define UART_LCS TTYS0_LCS
/* Data */ #define UART_RBR 0x00 #define UART_TBR 0x00
/* Control */ #define UART_IER 0x01 #define UART_IIR 0x02 #define UART_FCR 0x02 #define UART_LCR 0x03 #define UART_MCR 0x04 #define UART_DLL 0x00 #define UART_DLM 0x01
/* Status */ #define UART_LSR 0x05 #define UART_MSR 0x06 #define UART_SCR 0x07
#ifndef COMMON #include "common.h" #endif
extern inline int uart_can_tx_byte() { return inb(TTYS0_BASE + UART_LSR) & 0x20; }
extern inline void uart_wait_to_tx_byte() { while(!uart_can_tx_byte()) ; }
extern inline void uart_wait_until_sent() { while(!(inb(TTYS0_BASE + UART_LSR) & 0x40)) ; }
extern inline void uart_tx_byte(unsigned char data) { uart_wait_to_tx_byte(); outb(data, TTYS0_BASE + UART_TBR); /* Make certain the data clears the fifos */ uart_wait_until_sent(); }
extern inline int uart_have_rx_byte() { return inb(TTYS0_BASE + UART_LSR) & 0x1; }
extern inline void uart_enable_rx_byte() { /* say we are ready for a byte */ outb(0x02 | inb(TTYS0_BASE + UART_MCR), TTYS0_BASE + UART_MCR); }
extern inline void uart_disable_rx_byte() { /* say we aren't ready for another byte */ outb(~0x02 & inb(TTYS0_BASE + UART_MCR), TTYS0_BASE + UART_MCR); }
extern inline void uart_wait_for_rx_byte() { uart_enable_rx_byte(TTYS0_BASE); while(!uart_have_rx_byte(TTYS0_BASE)) ; uart_disable_rx_byte(); }
extern inline unsigned char uart_rx_byte() { static unsigned char data asm ("cl"); if (!uart_have_rx_byte(TTYS0_BASE)) { uart_wait_for_rx_byte(TTYS0_BASE); } data = inb(TTYS0_BASE + UART_RBR); return data; }
extern inline void uart_init() { /* disable interrupts */ outb(0x0, TTYS0_BASE + UART_IER); /* enable fifo's */ outb(0x01, TTYS0_BASE + UART_FCR); /* Set Baud Rate Divisor to 12 ==> 115200 Baud */ outb(0x80 | UART_LCS, TTYS0_BASE + UART_LCR); outb(TTYS0_DIV & 0xFF, TTYS0_BASE + UART_DLL); outb((TTYS0_DIV >> 8) & 0xFF, TTYS0_BASE + UART_DLM); outb(UART_LCS, TTYS0_BASE + UART_LCR); }
extern inline void uart_tx_bytes(char *data, unsigned len) { do { uart_wait_to_tx_byte(); outb(*data, TTYS0_BASE + UART_TBR); data++; len--; } while(len); uart_wait_until_sent(); }
extern inline unsigned long uart_rx_bytes(char * buffer, unsigned long size) { unsigned long bytes asm ("ecx"); bytes = 0; if (size == 0) { return 0; } if (!uart_have_rx_byte()) { uart_wait_for_rx_byte(); } do { buffer[bytes++] = inb(TTYS0_BASE + UART_RBR); } while((bytes < size) && uart_have_rx_byte()); return bytes; }
// // edit with tab stops = 4 (:set ts=4) // main setup // compile with: // gcc -O3 -S -fcall-used-esi -fcall-used-edi -fcall-used-ebx -fomit-fomit-frame-pointer -fverbose-asm -Winline // check for stack pushes with "grep esp setup.s" and "grep push setup.s". // Should only push/pop ebp which can be deleted. //
#include "console.h" #include "raminit.c" #include "chipinit.c" int main () { static char msg1[] = "Test output"; static char msg2[] = "Test output2"; static char msg3[] = "Test output3"; uart_init(); uart_tx_bytes(msg1,sizeof(msg1)); setup_chip(); uart_tx_bytes(msg2,sizeof(msg2)); setup_ram(); uart_tx_bytes(msg3,sizeof(msg3)); }
/* * 630_regs.h : Register Recommended Setting for SiS 630 [ABE][01] * * * Copyright 2000 Silicon Integrated Systems Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. * * * Reference: * 1. SiS 630 Specification * 2. SiS 630A1 Register Recommended Setting * Rev 0.97, Jan 7, 2000 * * $Id$ * * modd by S. Gehlbach <steve @ kesa . com> */
const unsigned char northbridge_init_table[] = { // Value -> Register /* generic PCI configuration space */ 0x07, 0x04, // Turn on Bus Master, 0x00, 0x05, // Memory and I/O Space 0x00, 0x06, // clear PCI status 0x00, 0x07, // 0x20, 0x0D, // Master Latency Timer = 32 PCI CLCK
/* SiS 630 specific registers. See SiS 630 Registers Recommended Setting */ /* Host Control Interface */ #ifdef ENABLE_SIS630_CPU_PIPELINE 0x9E, 0x50, // #else /* CPU Pipeline should be disable if on Coppermine CPU * or more than 1 Bus Master LAN */ 0x9C, 0x50, // #endif 0x00, 0x51, //
/* DRAM Control */ 0xC5, 0x52, // 0x00, 0x53, //
0x00, 0x54, // 0x00 -> 66/100 MHZ, 0x08 -> 133 MHZ 0x29, 0x55, // 0x29 -> 66/100 MHZ, 0x1D -> 133 MHZ 0x00, 0x56, // 0x00 -> 66 MHZ, 0x80 -> 100/133 MHZ 0x00, 0x57, // 0x00 -> 100 MHZ 0x01 -> 133 MHZ
/* Pre-driver Slew Rate/Current Driving Control */ 0x00, 0x58, // 0x35, 0x59, // 0x51, 0x5A, // 0x00, 0x5B, //
0x00, 0x65, // Use DIMM 0 for SMA
/* MISC Control */ 0xC0, 0x6A, // 0x01, 0x6B, // 0x00 -> 66/133 MHZ, 0x01 -> 100 MHZ 0x20, 0x6C, // 0x2E -> 66 MHZ, 0x20 -> 100 MHZ, 0x2C -> 133 MHZ
/* PCI Interface */ 0x21, 0x80, // 0xFF, 0x81, // 0x7F, 0x82, // 0x1E, 0x83, //
0x60, 0x84, // 0x00, 0x85, // 0x03, 0x86, // 0x40, 0x87, //
0x00, 0x88, // 0x08, 0x89, //
/* AGP GART Base Address */ 0x00, 0x90, 0x00, 0x91, 0x00, 0x92, 0x00, 0x93,
/* Graphic Window Size */ 0x40, 0x94, // Graphic Window Size == 64MB 0x01, 0x97, // Page Table Cache Enable 0x00, 0x98, // 0x02, 0x9C, // Integrated VGA Enable
/* DRAM Priority Timer Control Register */ 0x00, 0xA0, 0x00, 0xA1, 0x03, 0xA2, 0x01, 0xA3,
/* AGP Command Register */ 0x04, 0xC8, // AGP 4X 0x00, 0xC9 // AGP Disabled };
On Tue, 2003-02-25 at 13:09, Steve Gehlbach wrote:
I think you have to cramp your C coding style anyway, to stay within registers. The extra scratch area does not help much, with chips such
A compiler *made* to compile this way could help a great deal by scheduling instructions like an RPN calculator though, so *you* don't have to write in spaghetti code.
BUT, would C's block-scope local variables allow registers to be re-used by different local variables, so instead of
void func (void) { register int eax;
/* use i for one purpose */
/* use same i for something else */ }
you could do
void func (void) { { register int ramsize; /* use ramsize */ } { register int cpuid; /* use cpuid */ } }
and have both end up using the same register, kind of like a union, but still looking more like C than assembler.
as the SiS630, only has three gp regs, little help. You can't go around declaring variables willy nilly, you run out of space (registers) no matter what compiler.
But for the cases where (in the chipset or whereever) there are scratch registers, global extern static variables that are fixed when linking, (or define them in an assembler stub with .org or such) would allow them to be used easily from C.
I also have been experimenting with inline gcc, and I think it works pretty well and once you get the hang of it, and it _is_ easier than assy. I have re-coded the console routines, and ram setup, and spd timing setup on the sis630 for C. It is still a wip but I have tested the spd setup by wrapping a real main on a live machine.
So far it is about 400 lines of C, should I attach it? It compiles without using the stack (except for a %ebp push/pop which can be deleted).
Please do. It would be a good example of how complex the code can be with the register/inline constraints.
As for the frame pointer (you mean at the start of the first function), take a look at my sample, have you tried -fomit-frame-pointer?
I can't see how a special compiler gets you enough more than gcc to be worth the downside of effort and debugging. Scratch regs vary from chip to chip, and use 1.5 regs to access (pci). You could use the %ebp and %esp which is a gain of two (and maybe %es %fs etc), but I think using gcc will get us there anyway.
Looking at -fcall-saved-REG it seems that %esp and %ebp may be off limits for gcc, but the info docs are a little vague, perhaps it must be tried to be proven one way or the other. Inline assemply may allow them to be used, but that would almost be back where we started.
On 25 Feb 2003, Jeremy Jackson wrote:
BUT, would C's block-scope local variables allow registers to be re-used by different local variables, so instead of void func (void) { { register int ramsize; /* use ramsize */ } { register int cpuid; /* use cpuid */ } }
that should work very well.
ron
Jeremy Jackson wrote:
So far it is about 400 lines of C, should I attach it? It compiles without using the stack (except for a %ebp push/pop which can be deleted).
Please do. It would be a good example of how complex the code can be with the register/inline constraints.
I thought I did attach it. Did the list delete the attachments?
-Steve
Jeremy Jackson jerj@coplanar.net writes:
On Tue, 2003-02-25 at 13:09, Steve Gehlbach wrote:
I think you have to cramp your C coding style anyway, to stay within registers. The extra scratch area does not help much, with chips such
A compiler *made* to compile this way could help a great deal by scheduling instructions like an RPN calculator though, so *you* don't have to write in spaghetti code.
So far I like the extern inline aspect in that it makes a good proof of concept that this code can be written in C, and that it is not so outrageous to expect a compiler to do it.
Steve for a feel of my worries try compiling that code with gcc-3.3. If what I saw earlier today is right it won't work because someone has decided that aggressive inlining is bad thing...
BUT, would C's block-scope local variables allow registers to be re-used
by different local variables, so instead of
void func (void) { register int eax;
/* use i for one purpose */
/* use same i for something else */ }
you could do
void func (void) { { register int ramsize; /* use ramsize */ } { register int cpuid; /* use cpuid */ } }
and have both end up using the same register, kind of like a union, but still looking more like C than assembler.
Yes. That is basic live variable analysis. In general a value not a variable gets assigned to a register.
The rule is that if a variable is assigned a new value before the old value is used again. The variable can be dropped in the intervening space, and afterwards a new register can be assigned.
It can go as far as a new register assignment every time you change the value of a variable.
as the SiS630, only has three gp regs, little help. You can't go around declaring variables willy nilly, you run out of space (registers) no matter what compiler.
But for the cases where (in the chipset or whereever) there are scratch registers, global extern static variables that are fixed when linking, (or define them in an assembler stub with .org or such) would allow them to be used easily from C.
That definitely requires some thinking. When it requires extra registers to spill a register it can be hard for a compiler. I am going to start with the spill free case...
So far it is about 400 lines of C, should I attach it? It compiles without using the stack (except for a %ebp push/pop which can be deleted).
Please do. It would be a good example of how complex the code can be with the register/inline constraints.
It has already shown up on the list but watch out for the inline part that says it is a pixmap when it should be text plain.
Eric
Eric W. Biederman wrote:
Steve for a feel of my worries try compiling that code with gcc-3.3. If what I saw earlier today is right it won't work because someone has decided that aggressive inlining is bad thing...
Hmm... I assume you mean gcc-3.2.2 unless you have a pre-release. I have gcc3.2 on a RH machine, it seems to work on my code, although I had to delete a couple of asm's and fiddle with raminit.c: went back to "register int i asm ("ecx");". Otherwise gcc3.2 was more efficient (-181 lines), and it uses the %ebp differently. As a result, checking for "push" or "esp" is not effective to determine spill cases, you have to look for %ebp which it does tmp storage with mov's.
Interesting that gcc3.2 seems to ignore the -fomit-frame-pointer since there is a "leave" at the end. Not sure why.
Note the reason for the push of the %ebp in gcc2.95 is that 2.95 uses it for temp storage (hadn't noticed this before) if you set -fomit-frame-pointer, and so it push/pops it on entry and leaving.
-Steve
Steve Gehlbach steve@nexpath.com writes:
Eric W. Biederman wrote:
Steve for a feel of my worries try compiling that code with gcc-3.3. If what I saw earlier today is right it won't work because someone has decided that aggressive inlining is bad thing...
Hmm... I assume you mean gcc-3.2.2 unless you have a pre-release.
I don't currently have a pre release but that is what I was thinking about. But 3.3 is scheduled to release in the next couple of weeks. And one of the issues that was being vigorously discussed on the gcc list was how to handle the fact that 3.3 does not inline nearly as much as 3.2 and there was not currently a way to get it to inline more than it thought was reasonable. was
on a RH machine, it seems to work on my code, although I had to delete a couple of asm's and fiddle with raminit.c: went back to "register int i asm ("ecx");". Otherwise gcc3.2 was more efficient (-181 lines), and it uses the %ebp differently. As a result, checking for "push" or "esp" is not effective to determine spill cases, you have to look for %ebp which it does tmp storage with mov's.
The requirements of these small tweaks when building are why I am convinced that in the long run we need actual compiler support. Either in gcc or in one we write ourselves. For now we have an easier way to generate the assembly code.
Interesting that gcc3.2 seems to ignore the -fomit-frame-pointer since there is a "leave" at the end. Not sure why.
Note the reason for the push of the %ebp in gcc2.95 is that 2.95 uses it for temp storage (hadn't noticed this before) if you set -fomit-frame-pointer, and so it push/pops it on entry and leaving.
That makes sense.
Eric
Eric W. Biederman wrote:
The requirements of these small tweaks when building are why I am convinced that in the long run we need actual compiler support. Either in gcc or in one we write ourselves. For now we have an easier way to generate the assembly code.
Actually I don't mind the inline tweaking approach and I think it is easier to write and debug than assy, but others will disagree.
My concern is that a register-only compiler may be an intractable problem, and any way you go at it (custom or .md file for gcc) you will have to tweak and craft the C code just as much. If in fact this is true, then the benefit over inlining gcc seems minimal. Wish there were a way to answer this question before you invest a month of time.
-Steve
Steve Gehlbach steve@nexpath.com writes:
Eric W. Biederman wrote:
The requirements of these small tweaks when building are why I am convinced that in the long run we need actual compiler support. Either in gcc or in one we write ourselves. For now we have an easier way to generate the assembly code.
Actually I don't mind the inline tweaking approach and I think it is easier to write and debug than assy, but others will disagree.
The only part I don't like is that you have to keep the .S files around...
My concern is that a register-only compiler may be an intractable problem, and any way you go at it (custom or .md file for gcc) you will have to tweak and craft the C code just as much.
It should have problems only when you run out of registers, which I think is a rather different problem.
If in fact this is true, then the benefit over inlining gcc seems minimal. Wish there were a way to answer this question before you invest a month of time.
Oh. I think I can have some answers in less than a month. That is just what I have allocated for investigation. But it will take a week or so to get something working well enough to compare with the inline tweaking approach.
Plus write now I am enjoying the change of pace.
Eric
Eric W. Biederman wrote:
Oh. I think I can have some answers in less than a month. That is just what I have allocated for investigation. But it will take a week or so to get something working well enough to compare with the inline tweaking approach.
Plus write now I am enjoying the change of pace.
Cool, let's see how it goes. Feed it the decode routine from nrv2b.c and see how it likes that. gcc is a long ways from doing this in registers without a major re-write, which is probably about as much work as your implementation in assy. Very clever assy code, BTW.
Also I think the .S files don't have to be kept if we use a trivial post processing program like below.
-Steve
#!/usr/bin/perl # # program to process a .s file generated by gcc # to eliminate gcc gingerbread so the file # can be used for linuxbios. # Also check for register spillage and use of the # stack since this code is intended to run without # ram. # # GPL for the linuxbios project # # by. Steve M. Gehlbach (steve @ kesa . com) #
$ret = 0; while (<STDIN>) { # # save everything and go over it # once to check for errors # $line = $_; # get everthing up to comment # we don't want to bail out on items # in comments. $_ = (split(/#/))[0]; next unless ($_); # # now check for things that indicate that # the stack is being used, which means that gcc # failed to fully inline code # with no stack pushes &Abort if /(%esp)/; if (/(%ebp)/) { &Abort unless /%esp/; } &Abort if /\scall\s/; &Abort if /.Lfe2:/; /pushl/ && ($line = '#'.$line); /pop/ && ($line = '#'.$line); /leave/ && ($line = '#'.$line); /%esp/ && ($line = '#'.$line); /ret/ && do { # only one return should be there &Abort if ($ret); # convert the last return to a jump in case # ret is in middle of code; rel jump okay? $line =~ s/ret/jmp .Lfe1/; $ret++; }; # save the processed line push @lines,$line; } # # okay so print it out # while ($_ = shift @lines) { print $_; } exit (0); # # error print it with gas pseudo-ops # and bail # sub Abort { chomp $line; print ".print "$line <<< ERROR*** gcc uses stack or failed to inline code!"\n"; print ".err\n"; exit (1); }
Another idea on the subject:
On some chipsets, the entire range of supported processors have MMX and maybe SSE/SSE2. MMX gives 8 extra 64 bit registers (nobody uses floating-point in LinuxBIOS, right?) that can be used as 32bit. They can't be used as address/index/base, but only for data. See the MOVD instruction.
Newer versions of gcc offer to produce code which uses them with -mmmx, -m3dnow, or at least inline asm can use the registers. Might be nice for checksums with the vector instructions also.
On Wed, 2003-02-26 at 23:22, Steve Gehlbach wrote:
Jeremy Jackson jerj@coplanar.net writes:
Another idea on the subject:
On some chipsets, the entire range of supported processors have MMX and maybe SSE/SSE2. MMX gives 8 extra 64 bit registers (nobody uses floating-point in LinuxBIOS, right?) that can be used as 32bit. They can't be used as address/index/base, but only for data. See the MOVD instruction.
Newer versions of gcc offer to produce code which uses them with -mmmx, -m3dnow, or at least inline asm can use the registers. Might be nice for checksums with the vector instructions also.
Nice I had missed the fact that you can use MOVD with MMX registers. I have used it just a little bit when debugging to access the SSE registers but I didn't need them in production so they did not get used.
I think this is almost justification enough to write a new compiler. 16 extra 32 bit registers that you can use for scratch values on modern cpus. And 8 extra registers on the older cpus.
And using them can be a compile option so the code does not get polluted with strange assumptions.
I won't support those in the very first pass but that is definitely where I am going to work on as soon as I can produce code.
Can you see a way to stuff more than 32bit in there?
Eric
On Tue, Mar 04, 2003 at 11:25:18AM -0500, Jeremy Jackson wrote:
Another idea on the subject:
On some chipsets, the entire range of supported processors have MMX and maybe SSE/SSE2. MMX gives 8 extra 64 bit registers (nobody uses floating-point in LinuxBIOS, right?) that can be used as 32bit. They can't be used as address/index/base, but only for data. See the MOVD instruction.
Likewise, st0..st7 in the FPU could be used as a stack-like 8x64bit data storage, IIRC. I seem to remember that Pentium MMX CPUs share st* space with MMX registers however, so for those platforms one of them will have to do.
//Peter
Peter Stuge stuge-linuxbios@cdy.org writes:
On Tue, Mar 04, 2003 at 11:25:18AM -0500, Jeremy Jackson wrote:
Another idea on the subject:
On some chipsets, the entire range of supported processors have MMX and maybe SSE/SSE2. MMX gives 8 extra 64 bit registers (nobody uses floating-point in LinuxBIOS, right?) that can be used as 32bit. They can't be used as address/index/base, but only for data. See the MOVD instruction.
Likewise, st0..st7 in the FPU could be used as a stack-like 8x64bit data storage, IIRC.
There are no register to register moves from the integer registers to the floating point registers, unless I missed some. The data we care about is integer data, and so any extra registers just act as overflow locations that are used instead of the stack.
I seem to remember that Pentium MMX CPUs share st* space with MMX registers however, so for those platforms one of them will have to do.
As I recall you can only access one of them at a time, but it doesn't matter if there are no register->register transfers.
Anyway for those wondering where I am at. I currently have a compiler that gets as far as intermediate code for a useful subset of C expressions. Now I just have to do some optimizer work, and an actual code generator.
Mostly for the optimizer side I am initially concerned about doing a good job of register allocation.
Hopefully I can have something someone else can play with by the start of next week...
Eric
Steve Gehlbach steve@nexpath.com writes:
Eric W. Biederman wrote:
Oh. I think I can have some answers in less than a month. That is just what I have allocated for investigation. But it will take a week or so to get something working well enough to compare with the inline tweaking approach. Plus write now I am enjoying the change of pace.
Cool, let's see how it goes. Feed it the decode routine from nrv2b.c and see how it likes that.
I will try that when the time comes..
gcc is a long ways from doing this in registers without a major re-write, which is probably about as much work as your implementation in assy. Very clever assy code, BTW.
I just picked that code along with the nrv2b algorithm out of libucl. But I can at least recognize a good thing when I see it.
Also I think the .S files don't have to be kept if we use a trivial post processing program like below.
That looks reasonable. Well there is now a backup plan.
Eric
Eric W. Biederman wrote:
gcc is a long ways from doing this in registers without a major re-write, which is probably about as much work as your implementation in assy. Very clever assy code, BTW.
I just picked that code along with the nrv2b algorithm out of libucl. But I can at least recognize a good thing when I see it.
Yeah, I figured that out after staring at it and the C code for a while, and then found it in the ucl library. After manipulating the decompression C-code for a while, I'm pretty convinced that compiling this without spilling registers is a hard problem, even with a lot of re-arraranging. The assy code appears to take into account additional information about the algorithm that is not readily apparent from the C code, at least, that is the conclusion I came to after an hour or so of analyzing it.
-Steve
Eric W. Biederman wrote:
Steve for a feel of my worries try compiling that code with gcc-3.3. If what I saw earlier today is right it won't work because someone has decided that aggressive inlining is bad thing...
I did discover (by accident) that gcc3.2 gets more aggressive in inlining if you call the top routine "main" instead of some other function name. gcc2.95 doesn't have this behavior.
-Steve
Jeremy Jackson jerj@coplanar.net writes:
On Thu, 2003-02-13 at 14:11, Steve M. Gehlbach wrote:
Will you be able to use the gnu pre-processor unchanged? or adapt it? The macro expansion seems pretty important.
FYI, I was reading about newer GCC (3.2?) that have merged the preprocessor into the main parser.
It doesn't much matter as the preprocessor is not that complicated.
Eric
On Thu, 13 Feb 2003, GNUOrder wrote:
Would it be easier to design this into gcc as a separate architecture or would that be too unmanagable? It may or may not be of benefit to other embedded projects as well so there may be more help keeping it up to date if it was in the gcc tree. Then again, someone could modify it for their own needs and make it unusable for what it was intended.
and lest we forget, there are 8 32-bit scratchpad registers in most northbridges nowadays; it would be nice to use them.
ron
On 12 Feb 2003, Eric W. Biederman wrote:
Ron. I am going to need to do a branch soon as I do the Hammer port and integrate a small C compiler. And I want to make some clean fixes and remove some of the backwards compatibility cruft, and the linuxbios 1.0 code base is an inappropriate place to target. It should be possible to roll in most of the improvements into the 1.0 codebase, but that will just increase the clutter of the codebase.
OK, but be aware that a person here is doing the PPC port, so we'll need to see how to keep him up-to-date with what you're doing.
Although the PPC code is mostly C, since the cache-as-ram trick is actually well-supported and working on that architecture. We don't want to drop too many mainboards off the edge in the long term.
A branch would be a good place to fix up support for multiple south and north bridges.
ron
"Ronald G. Minnich" rminnich@lanl.gov writes:
On 12 Feb 2003, Eric W. Biederman wrote:
Ron. I am going to need to do a branch soon as I do the Hammer port and integrate a small C compiler. And I want to make some clean fixes and remove some of the backwards compatibility cruft, and the linuxbios 1.0 code base is an inappropriate place to target. It should be possible to roll in most of the improvements into the 1.0 codebase, but that will just increase the clutter of the codebase.
OK, but be aware that a person here is doing the PPC port, so we'll need to see how to keep him up-to-date with what you're doing.
I intend to be as public as I can about it, so the LinuxBIOS list should work.
Although the PPC code is mostly C, since the cache-as-ram trick is actually well-supported and working on that architecture.
Yes, which is one of the things that encouraged me to try it.
We don't want to drop too many mainboards off the edge in the long term.
Agreed. But at the same time our code base is such that hardwaremain() is not as fixed as it should be. Which means that without great care things break.
A branch would be a good place to fix up support for multiple south and north bridges.
I have to. Multiple identical northbridges, and southbridges will actually be quite common on the Hammer architecture. I actually have a fix present in the pci code to detect and handle these but I have not currently pushed the code.
What I expect is that initially everything will work in the 1.0 codebase but only a subset will work with the 2.0 codebase. At the same time a goal of the 2.0 codebase is that the infrastructure should be clean enough that we don't have to keep continually modifying it and breaking things when more boards are brought online.
Ron I don't know how to manage it but we need to setup a system where we have releases of the core codebase. And one of the tasks of doing a release need to be to review the changes that went in since the last release so we can avoid things like a broken intel_chip_post macro. Having code like that temporarily in CVS is fine. In the core that is a pain.
The way I expect developers to work is that they start with a stable release of the core, do a port. And the occasionally update to a later core. And when the code is on an uptodate core submitting code back to the core LinuxBIOS tree.
That roughly seems to fit how Tyson and other embedded guys work, and how I do ports I can support. The problem with submitting code back right now is that the CVS tree is a moving target. And I am very very cautious about importing it into my tree as it may break something, on a port I support.
Eric
On 13 Feb 2003, Eric W. Biederman wrote:
Agreed. But at the same time our code base is such that hardwaremain() is not as fixed as it should be. Which means that without great care things break.
I'm assuming hardwaremain is dead, although I will be sorry to see our very first linuxbios message go in the ashbin of history. :-)
Ron I don't know how to manage it but we need to setup a system where we have releases of the core codebase. And one of the tasks of doing a release need to be to review the changes that went in since the last release so we can avoid things like a broken intel_chip_post macro. Having code like that temporarily in CVS is fine. In the core that is a pain.
absolutely. Here is where my experience falls short. Do you have (or does anyone have) experience with managing this sort of thing?
I agree that the tree has been moving pretty quickly. I would request the committers to use the RFC process to this list before making far-reaching changes. Any change that involves .inc or .S files is far-reaching, no matter how small it looks.
ron
"Ronald G. Minnich" rminnich@lanl.gov writes:
On 13 Feb 2003, Eric W. Biederman wrote:
Agreed. But at the same time our code base is such that hardwaremain() is not as fixed as it should be. Which means that without great care things break.
I'm assuming hardwaremain is dead, although I will be sorry to see our very first linuxbios message go in the ashbin of history. :-)
Actually my intention is to simplify it and make it architectural neutral.
The basic layout I am looking at, is to have some kind of list of hardware in the code. And then for each piece of hardware call an initialization function. This has the potential to replace a lot of code in the current hardwaremain.c Handling initialization order is an interesting problem I have not tackled yet.
There is some prototyping of that in the p4dpr, build for this. I just have not taken advantage of it yet.
Ron I don't know how to manage it but we need to setup a system where we have releases of the core codebase. And one of the tasks of doing a release need to be to review the changes that went in since the last release so we can avoid things like a broken intel_chip_post macro. Having code like that temporarily in CVS is fine. In the core that is a pain.
absolutely. Here is where my experience falls short. Do you have (or does anyone have) experience with managing this sort of thing?
I agree that the tree has been moving pretty quickly. I would request the committers to use the RFC process to this list before making far-reaching changes. Any change that involves .inc or .S files is far-reaching, no matter how small it looks.
A little bit, and Ken Yap of etherboot does a good job, so it may be worth looking at what he is doing and emulate that to some extent.
Basically one person is the release manager (usually the maintainer of the project). And that person makes the final decision when a release is ready. Bumps the version number, tags CVS and puts out a tarball with all of the code as of that release.
One of the things I do with releases at Linux Networx, is I always make a diff against the last release. Review it. And make certain I know what all of the code in there is for. And I worry more about generic code that affects everyone. And less about what an happens on an individual board. And stable releases are different from development releases. For development releases a things breaking is expected. For stable releases the level of care and conservatism must go up.
Which is while I figure it is time to start a development version of LinuxBIOS so I can twist turn and break things. The 1.0.x series can stay around supporting everything we do now. The 1.2.x or 2.0.x series will handle all new things. And if a motherboard has an active maintainer. The code can be moved to the new codebase. If not we can just drop that port from the latest version of the tree.
But dropping ports and breaking ports by design should only happen on major versions of LinuxBIOS. And new major versions should come very seldom. As our abstraction layer gets better there are fewer and fewer reasons to break an existing port.
But there are a lot of dead experimental features whose time has come to be dropped. Non ELF booting. northbridge_fixup, southbridge_fixup, etc calls in hardwaremain. Which are just not generic enough. Some of the old pci calls. The etherboot built into LinuxBIOS, etc.
Every project is different and things work out differently. By dropping the code in a development release and still supporting it in the last stable release things should get a lot better.
Eric
On 13 Feb 2003, Eric W. Biederman wrote:
The basic layout I am looking at, is to have some kind of list of hardware in the code. And then for each piece of hardware call an initialization function. This has the potential to replace a lot of code in the current hardwaremain.c Handling initialization order is an interesting problem I have not tackled yet.
I think the superio stuff is one possible way. I've found that three passes covers the case for superio, and actually for southbridge.
For the multiple northbridge case did you want to discover northbridges dynamically or continue to specify it in a config file.
The _fixup stuff goes away with the three pass approach, I think.
I'd like to hash this out a bit on the list the way I've hashed out other approaches in the past (config tool, new superio, etc.) to see if we can't scare up good ideas from other people.
ron
"Ronald G. Minnich" rminnich@lanl.gov writes:
On 13 Feb 2003, Eric W. Biederman wrote:
The basic layout I am looking at, is to have some kind of list of hardware in the code. And then for each piece of hardware call an initialization function. This has the potential to replace a lot of code in the current hardwaremain.c Handling initialization order is an interesting problem I have not tackled yet.
I think the superio stuff is one possible way. I've found that three passes covers the case for superio, and actually for southbridge.
For the multiple northbridge case did you want to discover northbridges dynamically or continue to specify it in a config file.
Auto discovery is necessary as everything may not be plugged in. But I suspect I will also want to specify information that is valid if the cpu/northbridge is plugged in.
The _fixup stuff goes away with the three pass approach, I think.
I need to look closely at this issue. We have one huge hook before hardwaremain. Beyond that I suspect doing something as simple as going through the devices in a tree structured order would remove the need for multiple passes.
I'd like to hash this out a bit on the list the way I've hashed out other approaches in the past (config tool, new superio, etc.) to see if we can't scare up good ideas from other people.
We are still on the list. And talking is good. But to a certain extent you don't see things until code is written and you try it.
So we need a development branch to try these things out on. I don't promise we will get it perfect the first try. The goal is to get it close enough that we won't break ports by going the last few inches.
Things like having no guaranteed order the code will be called independent of the device tree should help. By device tree I am thinking of the something roughly like pci device tree. How far devices are from the cpu. The goal is not perfection but a useful approximation of reality.
Eric
On 14 Feb 2003, Eric W. Biederman wrote:
I need to look closely at this issue. We have one huge hook before hardwaremain. Beyond that I suspect doing something as simple as going through the devices in a tree structured order would remove the need for multiple passes.
I am not so sure. The Acer superio needed at least two passes, one before pci scan and one after, the reason being that certain resources were not visible on PCI unless and enable was set, then the device got scanned, then some post-scan fixup got done.
ron
I am not so sure. The Acer superio needed at least two passes, one before pci scan and one after, the reason being that certain resources were not visible on PCI unless and enable was set, then the device got scanned, then some post-scan fixup got done.
Same as VIA for enabling ethernet and IDE's compatible mode.
-Andrew
Andrew Ip aip@cwlinux.com writes:
I am not so sure. The Acer superio needed at least two passes, one before pci scan and one after, the reason being that certain resources were not visible on PCI unless and enable was set, then the device got scanned, then some post-scan fixup got done.
Same as VIA for enabling ethernet and IDE's compatible mode.
For a number of these case forcing the device into the table of devices present can help a lot.
Eric
On 14 Feb 2003, Eric W. Biederman wrote:
For a number of these case forcing the device into the table of devices present can help a lot.
yes but, there is a loop.
= enable some piece of the device = scan pci = enable more bits of the device based on the pci scan
I am pretty sure dependency trees don't cover that kind of cycle.
ron
"Ronald G. Minnich" rminnich@lanl.gov writes:
On 14 Feb 2003, Eric W. Biederman wrote:
For a number of these case forcing the device into the table of devices present can help a lot.
yes but, there is a loop.
= enable some piece of the device = scan pci = enable more bits of the device based on the pci scan
I am pretty sure dependency trees don't cover that kind of cycle.
First what the code does right now is:
Enable pci bus. scan pci bus. Enable pci buses found. scan pci buses found.
etc.
So in the worst case you should be able to lie and put in a pseudo device, or two that happen to be called at the proper times.
Which I why I think I can get away with a single pass...
Eric
On 14 Feb 2003, Eric W. Biederman wrote:
So in the worst case you should be able to lie and put in a pseudo device, or two that happen to be called at the proper times.
pseudo-device per chip type?
Or a generic pseudo-device? or ...
ron
"Ronald G. Minnich" rminnich@lanl.gov writes:
On 14 Feb 2003, Eric W. Biederman wrote:
So in the worst case you should be able to lie and put in a pseudo device, or two that happen to be called at the proper times.
pseudo-device per chip type?
Or a generic pseudo-device? or ...
In most cases I think just putting it in the table of devices so we find the device even if the default pci_scan does not report it, should be enough.
For the weird cases. I am content that I can put a dummy device at an appropriate place in the list of devices, and have it do something. That is not a case I plan on using.
If we can get the abstraction so it actually models what is going on. We are in much better shape. And that is what I intend to concentrate on.
So what I am looking at:
The current pci bus scan starts with a root pci bus device. And then finds the devices on that bus, and the recursively scans the sub busses.
And the following things are possible.
We can have multiple top level devices. We can have force a device onto a given pci bus.
I want to make this process device driven, and get the configuration to just decorate the device tree with information such as irq routing, and default device settings (like the baud rate).
Irq routing currently does interesting things if you plug in a pci card with a bridge chip on it, I believe all of our current tables have the potential to hand out the wrong information as the bus numbers change.
Eric
On 14 Feb 2003, Eric W. Biederman wrote:
So what I am looking at:
The current pci bus scan starts with a root pci bus device. And then finds the devices on that bus, and the recursively scans the sub busses.
I'm still not getting it.
For some chips, LinuxBIOS needs to call some chip-specific code for that device BEFORE the pci bus scan, or resource allocation gets done incorrectly. How can we structure the code so that the pre-pci init functions work in a reasonable way that most people can understand? The current superio stuff does this, and the information to drive it is contained in the superio file. If you have code that needs to be called before the PCI scan, you initalize the structure member to point to a function. Same for after PCI scan, same for right before hardwaremain() exits. You can look at the superio.c file for the given chip and see if this pre-pci-scan function exists and is called.
I like the self-contained nature of the superio.c files (I grabbed all those ideas from Plan 9). Does this seem desirable to people? If so, can we expand it to the south and north bridges so as to replace the current ad-hoc mechanisms?
If this mechanism is not desirable, how would it change? Would the .c files continue to be self-contained or would we start getting into more linker sets and other magic stuff (which, as you can tell, worries me as it does confuse people).
I would appreciate some feedback from the other committers and developers.
ron
"Ronald G. Minnich" rminnich@lanl.gov writes:
On 14 Feb 2003, Eric W. Biederman wrote:
So what I am looking at:
The current pci bus scan starts with a root pci bus device. And then finds the devices on that bus, and the recursively scans the sub busses.
O.k. Getting into specifics:
From our pci.h, the current operations we have for a pci device.
struct pci_dev_operations { void (*read_resources)(struct pci_dev *dev); void (*set_resources)(struct pci_dev *dev); void (*init)(struct pci_dev *dev); unsigned int (*scan_bus)(struct pci_dev *bus, unsigned int max); };
I'm still not getting it.
For some chips, LinuxBIOS needs to call some chip-specific code for that device BEFORE the pci bus scan, or resource allocation gets done incorrectly. How can we structure the code so that the pre-pci init functions work in a reasonable way that most people can understand? The current superio stuff does this, and the information to drive it is contained in the superio file. If you have code that needs to be called before the PCI scan, you initalize the structure member to point to a function. Same for after PCI scan, same for right before hardwaremain() exits. You can look at the superio.c file for the given chip and see if this pre-pci-scan function exists and is called.
O.k. The superio code I definitely figure is a step in that direction.
This discussion must get into nitty gritty specifics if we are going to reach a consensus.
What we currently do: pci_enumerate(); pci_configure(); pci_initialize();
In pci_enumerate(), If a device is in the hard coded list it is placed into the list of pci devices even if it does not exist. This allows invisible devices on the motherboard to be configured.
In addition each device is assigned a set of operations. set_pci_ops does that work.
set_pci_ops first looks through a table of device drivers and listed by vendor and device id to see if it can find a specific driver for that device. If a driver is found it uses those device methods. Otherwise depending if it is a bridge or a normal pci device a default set of methods is assigned.
After pci_scan_bus finishes enumerating the devices on a bus. It looks at each child device to see if it has a scan_bus method. And if so it calls child->ops->scan_bus(). Which is usually just pci_scan_bus again.
In pci_configure. The device tree is walked again. And for each device it's read_resources operation is called, to find the bar's on that device. read_resources allows for the presence of non standard bars. Each bar read_resources returns can either be a fixed resource, that must be catered to. Or a resource needing to be assigned a value.
Then a second pass is made to the tree and set_resources is called on each device.
For bridges, read_resources, and set_resources are recursive.
In pci_initialize. The device tree is walked one last time, and this time the init() method of each device is called.
I like the self-contained nature of the superio.c files (I grabbed all those ideas from Plan 9). Does this seem desirable to people?
Yes. Self contained is good. The one part my newer pci code that is lacking that you did well on the superio code is a way to build the static device information. I suspect we want to allow a different structure for each different device, but basically I like the superio directive.
If so, can we expand it to the south and north bridges so as to replace the current ad-hoc mechanisms?
Look at what I have done with pci. Essentially I have already done that. Everything but making the configuration manageable.
If this mechanism is not desirable, how would it change? Would the .c files continue to be self-contained or would we start getting into more linker sets and other magic stuff (which, as you can tell, worries me as it does confuse people).
The only magic thing that I might touch with a linker is building the table of devices. But that is to make things more self contained.
I would appreciate some feedback from the other committers and developers.
Agreed. Just Eric talking is imperfect.
I just looked through the LinuxBIOS tree at every superio.c. And perhaps I missed something but I did not see a single superio file that used more than one of the three functions defined, and none of them used the pre pci function.
So all of those I can straight forwardly transformed so their init or finishup method is converted into a pci init method and called from pci_initialize().
For the hypothetical case of needing a device that is called before the rest of the pci initialization, it can be forced into the device tree ahead of everything else. And it's scan_bus method can do what special magic is required.
Similarly for the hypothetical case of needing a device that is called after everything else, a dummy device can be inserted way at the bottom and the end of the pci device tree, so that it's init method will be called last.
So it is possible in a nasty way to cope with those hypothetical situations. Buying time for a clean fix to be put into the LinuxBIOS tree.
If you have specific examples of things that need to be done I am quite willing to discuss them and how I would fit it into the new frame work.
The code flow I am confident of. The mechanism to configure it that I do not currently have that problem solved. But I suspect we can use something like the current superio configuration to statically build parts of the device tree.
Eric
On Thu, 2003-02-13 at 05:29, Eric W. Biederman wrote:
Years ago when C was young and implementations were not readily available people compilers for subsets of C and called then small C compilers.
Not sure if this is of any use/relevance, but mention of Small-C brought back some distant memories... On my bookshelf I have a copy of :
The Small-C Handbook James E. Hendrix 1984
It has the source of the small-c compiler in it. It is targeted at the 8080 though. The source is only a thousand or so lines of C. Can't see any sign of a licence though..
Way back when I got it I had thought of porting it to the 68000.. but ending up using BCPL instead....
Ian Castle ian.castle@coldcomfortfarm.net writes:
On Thu, 2003-02-13 at 05:29, Eric W. Biederman wrote:
Years ago when C was young and implementations were not readily available people compilers for subsets of C and called then small C compilers.
Not sure if this is of any use/relevance, but mention of Small-C brought back some distant memories... On my bookshelf I have a copy of :
The Small-C Handbook James E. Hendrix 1984
It has the source of the small-c compiler in it. It is targeted at the 8080 though. The source is only a thousand or so lines of C. Can't see any sign of a licence though..
I think I have seen it as shareware type license...
With the small size you can see why I find it attractive to just write a compiler...
Eric