Re: file system error (fwd)

List overview All Threads
Download

newer

older

LinuxBIOS and Vxworks

LinuxBIOS on my system.

Ronald G. Minnich

29 Nov 2002 29 Nov '02

1 p.m.

any ideas?

ron

---------- Forwarded message ---------- Date: Fri, 29 Nov 2002 15:35:52 +0900 From: Munjun Kang malas@pinetron.com To: Ronald G. Minnich rminnich@lanl.gov Subject: Re: file system error

Thanks for your reply.

I tried as your suggest.

dd if=/dev/zero of=/dev/hda bs=1024 count=10000 ~ 100000 Segmentation fault's are occured in random by count.

and then, turn off the UDMA feature by hdparm option. But, I can see same symptom.

In this time, I tried to attach SCSI & IEEE1394 SBP-2 devices. case1. Adaptec 2930 SCSI adapter + 8GB Seagate SCSI HDD case2. IEEE1394 Interface card + external IEEE1394 HDD both cases show the same problem.

Now, I think it's not a DMA problem. I'm in the maze. hmmmm......

Is there any clear hint?

----- Original Message ----- From: "Ronald G. Minnich" rminnich@lanl.gov To: "Munjun Kang" malas@pinetron.com Cc: linuxbios@clustermatic.org Sent: Thursday, November 28, 2002 12:55 AM Subject: Re: file system error

...

can you try a few things:

do a raw dd command to the disk from /dev/zero and see if that works,

e.g. dd if=/dev/zero of=/dev/hda

turn off UDMA, in fact turn off ALL dma at first

try to make it run totally PIO

This is weird, but it does seem like some kind of DMA problem.

Also, boot under the normal bios and do an lspci -xxx and then do an lspci -xxx under linuxbios and see if you see any IDE setup differences.

ron

Show replies by date

Nathanael Noblet

29 Nov 29 Nov

1:44 p.m.

New subject: file system error (fwd)

On Friday, November 29, 2002, at 10:03 AM, Ronald G. Minnich wrote:

...

In this time, I tried to attach SCSI & IEEE1394 SBP-2 devices. case1. Adaptec 2930 SCSI adapter + 8GB Seagate SCSI HDD case2. IEEE1394 Interface card + external IEEE1394 HDD both cases show the same problem.

I worked on some ieee1394 devices about 4-6 months ago. I don't think there was support for SBP-2 devices yet... In fact I don't remember when it was supposed to start being supported... so that my be the problem. I haven't been on the up and up on IEEE1394 for a while though.

-- Nathanael Noblet Gnat Solutions 4604 Monterey Ave NW Calgary, AB T3B 5K4 T/F 403.288.5360 C 403.809.5368 http://www.gnat.ca/

Justin Cormack

2:27 p.m.

New subject: file system error (fwd)

On Fri, 2002-11-29 at 17:31, Nathanael Noblet wrote:

...

On Friday, November 29, 2002, at 10:03 AM, Ronald G. Minnich wrote:

...
In this time, I tried to attach SCSI & IEEE1394 SBP-2 devices. case1. Adaptec 2930 SCSI adapter + 8GB Seagate SCSI HDD case2. IEEE1394 Interface card + external IEEE1394 HDD both cases show the same problem.

I worked on some ieee1394 devices about 4-6 months ago. I don't think there was support for SBP-2 devices yet... In fact I don't remember when it was supposed to start being supported... so that my be the problem. I haven't been on the up and up on IEEE1394 for a while though.

er, no. sbp2 has been supported for well over 6 months Has been in 2.4 for quite some time.

Justin

Nathanael Noblet

2:50 p.m.

New subject: file system error (fwd)

On Friday, November 29, 2002, at 11:19 AM, Justin Cormack wrote:

...

On Fri, 2002-11-29 at 17:31, Nathanael Noblet wrote:

...
On Friday, November 29, 2002, at 10:03 AM, Ronald G. Minnich wrote:

...
In this time, I tried to attach SCSI & IEEE1394 SBP-2 devices. case1. Adaptec 2930 SCSI adapter + 8GB Seagate SCSI HDD case2. IEEE1394 Interface card + external IEEE1394 HDD both cases show the same problem.

I worked on some ieee1394 devices about 4-6 months ago. I don't think there was support for SBP-2 devices yet... In fact I don't remember when it was supposed to start being supported... so that my be the problem. I haven't been on the up and up on IEEE1394 for a while though.

er, no. sbp2 has been supported for well over 6 months Has been in 2.4 for quite some time.

oh right, I was looking into a feature of sbp-2 ( I wanted my pc to act as a device towards another machine, allowing the "client" to use the HDs and CDwriters....) and that wasn't working...

-- Nathanael Noblet Gnat Solutions 4604 Monterey Ave NW Calgary, AB T3B 5K4 T/F 403.288.5360 C 403.809.5368 http://www.gnat.ca/

ebiederman＠lnxi.com

3:11 p.m.

New subject: file system error (fwd)

"Ronald G. Minnich" rminnich@lanl.gov writes:

...

any ideas?

ron

---------- Forwarded message ---------- Date: Fri, 29 Nov 2002 15:35:52 +0900 From: Munjun Kang malas@pinetron.com To: Ronald G. Minnich rminnich@lanl.gov Subject: Re: file system error

Thanks for your reply.

I tried as your suggest.

dd if=/dev/zero of=/dev/hda bs=1024 count=10000 ~ 100000 Segmentation fault's are occured in random by count.

and then, turn off the UDMA feature by hdparm option. But, I can see same symptom.

All DMA is turned off hdparm -d0 /dev/hda ??

...

In this time, I tried to attach SCSI & IEEE1394 SBP-2 devices. case1. Adaptec 2930 SCSI adapter + 8GB Seagate SCSI HDD case2. IEEE1394 Interface card + external IEEE1394 HDD both cases show the same problem.

The northbridge having DMA problems is still a canidate. SCSI disks do DMA as well.

...

Now, I think it's not a DMA problem. I'm in the maze. hmmmm......

Is there any clear hint?

Past history with the Athlon problems on VIA chipsets says that some VIA northbridges have problems with burst traffic.

And either DMA or a fast memory copy could trigger it. memtest86 currently does not have an optimized memcpy so it could miss that problem.

Currently I consider your northbridge to be the best canidate. The same kernel is run under both BIOSes?

Compared to your previous BIOS are there any unknown settings in the northbridge?

In particular what are the differences between, on both boards, and can you account for the differences. lspci -s 0:0.0 -xxx And can you account for all differences.

Eric

Munjun Kang

1 Dec 1 Dec

8:51 p.m.

New subject: file system error (fwd)

...

"Ronald G. Minnich" rminnich@lanl.gov writes:

...
any ideas?

ron

---------- Forwarded message ---------- Date: Fri, 29 Nov 2002 15:35:52 +0900 From: Munjun Kang malas@pinetron.com To: Ronald G. Minnich rminnich@lanl.gov Subject: Re: file system error

Thanks for your reply.

I tried as your suggest.

dd if=/dev/zero of=/dev/hda bs=1024 count=10000 ~ 100000 Segmentation fault's are occured in random by count.

and then, turn off the UDMA feature by hdparm option. But, I can see same symptom.

All DMA is turned off hdparm -d0 /dev/hda ??

Yes, I did it.

...

...
In this time, I tried to attach SCSI & IEEE1394 SBP-2 devices. case1. Adaptec 2930 SCSI adapter + 8GB Seagate SCSI HDD case2. IEEE1394 Interface card + external IEEE1394 HDD both cases show the same problem.

The northbridge having DMA problems is still a canidate. SCSI disks do DMA as well.

...
Now, I think it's not a DMA problem. I'm in the maze. hmmmm......

Is there any clear hint?

Past history with the Athlon problems on VIA chipsets says that some VIA northbridges have problems with burst traffic.

And either DMA or a fast memory copy could trigger it. memtest86 currently does not have an optimized memcpy so it could miss that problem.

Currently I consider your northbridge to be the best canidate. The same kernel is run under both BIOSes?

I tried in several cases. 1. Build-in BIOS + 2.4.18-13 (redhat 8.0) => work 2. Linuxbios + 2.4.18-13 (redhat 8.0) => don't work 3. Linuxbios + 2.4.19 => don't work

...

Compared to your previous BIOS are there any unknown settings in the northbridge?

In particular what are the differences between, on both boards, and can you account for the differences. lspci -s 0:0.0 -xxx And can you account for all differences.

In work, 00:00.0 Host bridge: VIA Technologies, Inc. VT8605 [ProSavage PM133] 00: 06 11 05 06 06 00 10 a2 00 00 00 06 00 08 00 00 10: 08 00 00[e8]00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00[06 11 05 06] 30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: fd db c8 be 05 00 08 08 c0 00 08 08 08 08 08 08 60: 03[aa]00[20]e6 d5 d5 00 43 38 86 0d 08 21 00 00 70: c4 88[cc]0c 0e 81 52 00 01 b4 09 00 00 00 00 00 80: 0f 40 00 00 c0 00 00 00 02 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 02 c0 20 00 07 02 00 1f 00 00 00 00 2f 02 04 00 b0: 40 ff 10 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 06 00 32 42 00 b0 00 00 00 00

In problem, 00:00.0 Class 0600: 1106:0605 00: 06 11 05 06 06 00 10 a2 00 00 00 06 00 08 00 00 10: 08 00 00[f8]00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00[00 00 00 00] 30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: fd db c8 be 05 00 08 08 c0 00 08 08 08 08 08 08 60: 03[00]00[00]e6 d5 d5 00 43 38 86 0d 08 21 00 00 70: c4 88[4c]0c 0e 81 52 00 01 b4 09 00 00 00 00 00 80: 0f 40 00 00 c0 00 00 00 02 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 02 c0 20 00 07 02 00 1f 00 00 00 00 2f 02 04 00 b0: 40 ff 10 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 06 00 32 42 00 b0 00 00 00 00

0x13 : Graphics Aperature Base 0x2c : I don't know. maybe same with vendor & device ID 0x61, 0x62 : shadow ram setting 0x72 : CPU to PCI Flow control. difference bit 7 is described as follow. 7bit Retry Status 0 No retry occurred -------- default 1 Retry occurred ----------- write 1 to clear

In my opinion, there are not special differences.

...

Eric

Linuxbios mailing list Linuxbios@clustermatic.org http://www.clustermatic.org/mailman/listinfo/linuxbios

steven james

2 Dec 2 Dec

8:35 a.m.

New subject: file system error (fwd)

Greetings,

Any poassability that the amount of memory passed to the kernel is too high and the crashes are due to using non-existant memory for a buffer or struct?

1st test is check reported memory on boot, 2nd is memtest86, 3rd is to pass mem=(some small value) to the kernel and see what happens.

G'day, sjames

On Mon, 2 Dec 2002, Munjun Kang wrote:

...

...
"Ronald G. Minnich" rminnich@lanl.gov writes:

...
any ideas?

ron

---------- Forwarded message ---------- Date: Fri, 29 Nov 2002 15:35:52 +0900 From: Munjun Kang malas@pinetron.com To: Ronald G. Minnich rminnich@lanl.gov Subject: Re: file system error

Thanks for your reply.

I tried as your suggest.

dd if=/dev/zero of=/dev/hda bs=1024 count=10000 ~ 100000 Segmentation fault's are occured in random by count.

and then, turn off the UDMA feature by hdparm option. But, I can see same symptom.

All DMA is turned off hdparm -d0 /dev/hda ??

Yes, I did it.

...
...
In this time, I tried to attach SCSI & IEEE1394 SBP-2 devices. case1. Adaptec 2930 SCSI adapter + 8GB Seagate SCSI HDD case2. IEEE1394 Interface card + external IEEE1394 HDD both cases show the same problem.

The northbridge having DMA problems is still a canidate. SCSI disks do DMA as well.

...
Now, I think it's not a DMA problem. I'm in the maze. hmmmm......

Is there any clear hint?

Past history with the Athlon problems on VIA chipsets says that some VIA northbridges have problems with burst traffic.

And either DMA or a fast memory copy could trigger it. memtest86 currently does not have an optimized memcpy so it could miss that problem.

Currently I consider your northbridge to be the best canidate. The same kernel is run under both BIOSes?

I tried in several cases.

Build-in BIOS + 2.4.18-13 (redhat 8.0) => work

Linuxbios + 2.4.18-13 (redhat 8.0) => don't work

Linuxbios + 2.4.19 => don't work

...
Compared to your previous BIOS are there any unknown settings in the northbridge?

In particular what are the differences between, on both boards, and can you account for the differences. lspci -s 0:0.0 -xxx And can you account for all differences.

In work, 00:00.0 Host bridge: VIA Technologies, Inc. VT8605 [ProSavage PM133] 00: 06 11 05 06 06 00 10 a2 00 00 00 06 00 08 00 00 10: 08 00 00[e8]00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00[06 11 05 06] 30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: fd db c8 be 05 00 08 08 c0 00 08 08 08 08 08 08 60: 03[aa]00[20]e6 d5 d5 00 43 38 86 0d 08 21 00 00 70: c4 88[cc]0c 0e 81 52 00 01 b4 09 00 00 00 00 00 80: 0f 40 00 00 c0 00 00 00 02 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 02 c0 20 00 07 02 00 1f 00 00 00 00 2f 02 04 00 b0: 40 ff 10 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 06 00 32 42 00 b0 00 00 00 00

In problem, 00:00.0 Class 0600: 1106:0605 00: 06 11 05 06 06 00 10 a2 00 00 00 06 00 08 00 00 10: 08 00 00[f8]00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00[00 00 00 00] 30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: fd db c8 be 05 00 08 08 c0 00 08 08 08 08 08 08 60: 03[00]00[00]e6 d5 d5 00 43 38 86 0d 08 21 00 00 70: c4 88[4c]0c 0e 81 52 00 01 b4 09 00 00 00 00 00 80: 0f 40 00 00 c0 00 00 00 02 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 02 c0 20 00 07 02 00 1f 00 00 00 00 2f 02 04 00 b0: 40 ff 10 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 06 00 32 42 00 b0 00 00 00 00

0x13 : Graphics Aperature Base 0x2c : I don't know. maybe same with vendor & device ID 0x61, 0x62 : shadow ram setting 0x72 : CPU to PCI Flow control. difference bit 7 is described as follow. 7bit Retry Status 0 No retry occurred -------- default 1 Retry occurred ----------- write 1 to clear

In my opinion, there are not special differences.

...
Eric

Linuxbios mailing list Linuxbios@clustermatic.org http://www.clustermatic.org/mailman/listinfo/linuxbios

Linuxbios mailing list Linuxbios@clustermatic.org http://www.clustermatic.org/mailman/listinfo/linuxbios

-- -------------------------steven james, director of research, linux labs ... ........ ..... .... 230 peachtree st nw ste 701 the original linux labs atlanta.ga.us 30303 -since 1995 http://www.linuxlabs.com office 404.577.7747 fax 404.577.7743 -----------------------------------------------------------------------

Ronald G. Minnich

10:24 a.m.

New subject: file system error (fwd)

On Mon, 2 Dec 2002, steven james wrote:

...

Any poassability that the amount of memory passed to the kernel is too high and the crashes are due to using non-existant memory for a buffer or struct?

that has worked for me on some ugly platforms, so it is a good thing to try.

ron

Munjun Kang

4 Dec 4 Dec

9:05 p.m.

New subject: file system error (fwd)

Thanks for your concerning.

...

Greetings,

Any poassability that the amount of memory passed to the kernel is too high and the crashes are due to using non-existant memory for a buffer or struct?

I've installed the 128MB(in one bank) Memory. But, my Northbridge uses SMA(shared memory architecture) for interal graphics core(savage 4). For SMA, NB reserves the 8~32MB physical memory. I did test in various SMA size, 0, 8, 32 etc. In my think, this causes the problem.

...

1st test is check reported memory on boot, 2nd is memtest86, 3rd is to pass mem=(some small value) to the kernel and see what happens.

1st. memory reporting through e820 memmap is good. 2nd. memtest86 has no error, but often shows 1 error/day. 3rd. command line mem=xxm option show the many symptom. below mem=80m, it works good. but, over mem=80m to 120m , it show kernel panic after found compressed or segmentation fault in harddisk access.

...

G'day, sjames

On Mon, 2 Dec 2002, Munjun Kang wrote:

Thanks for reading.

Malas. (Munjun Kang)

...

...
...
"Ronald G. Minnich" rminnich@lanl.gov writes:

...
any ideas?

ron

---------- Forwarded message ---------- Date: Fri, 29 Nov 2002 15:35:52 +0900 From: Munjun Kang malas@pinetron.com To: Ronald G. Minnich rminnich@lanl.gov Subject: Re: file system error

Thanks for your reply.

I tried as your suggest.

dd if=/dev/zero of=/dev/hda bs=1024 count=10000 ~ 100000 Segmentation fault's are occured in random by count.

and then, turn off the UDMA feature by hdparm option. But, I can see same symptom.

All DMA is turned off hdparm -d0 /dev/hda ??

Yes, I did it.

...
...
In this time, I tried to attach SCSI & IEEE1394 SBP-2 devices. case1. Adaptec 2930 SCSI adapter + 8GB Seagate SCSI HDD case2. IEEE1394 Interface card + external IEEE1394 HDD both cases show the same problem.

The northbridge having DMA problems is still a canidate. SCSI disks do DMA as well.

...
Now, I think it's not a DMA problem. I'm in the maze. hmmmm......

Is there any clear hint?

Past history with the Athlon problems on VIA chipsets says that some VIA northbridges have problems with burst traffic.

And either DMA or a fast memory copy could trigger it. memtest86 currently does not have an optimized memcpy so it could miss that problem.

Currently I consider your northbridge to be the best canidate. The same kernel is run under both BIOSes?

I tried in several cases.

Build-in BIOS + 2.4.18-13 (redhat 8.0) => work

Linuxbios + 2.4.18-13 (redhat 8.0) => don't work

Linuxbios + 2.4.19 => don't work

...
Compared to your previous BIOS are there any unknown settings in the northbridge?

In particular what are the differences between, on both boards, and can you account for the differences. lspci -s 0:0.0 -xxx And can you account for all differences.

In work, 00:00.0 Host bridge: VIA Technologies, Inc. VT8605 [ProSavage PM133] 00: 06 11 05 06 06 00 10 a2 00 00 00 06 00 08 00 00 10: 08 00 00[e8]00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00[06 11 05 06] 30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: fd db c8 be 05 00 08 08 c0 00 08 08 08 08 08 08 60: 03[aa]00[20]e6 d5 d5 00 43 38 86 0d 08 21 00 00 70: c4 88[cc]0c 0e 81 52 00 01 b4 09 00 00 00 00 00 80: 0f 40 00 00 c0 00 00 00 02 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 02 c0 20 00 07 02 00 1f 00 00 00 00 2f 02 04 00 b0: 40 ff 10 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 06 00 32 42 00 b0 00 00 00 00

In problem, 00:00.0 Class 0600: 1106:0605 00: 06 11 05 06 06 00 10 a2 00 00 00 06 00 08 00 00 10: 08 00 00[f8]00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00[00 00 00 00] 30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: fd db c8 be 05 00 08 08 c0 00 08 08 08 08 08 08 60: 03[00]00[00]e6 d5 d5 00 43 38 86 0d 08 21 00 00 70: c4 88[4c]0c 0e 81 52 00 01 b4 09 00 00 00 00 00 80: 0f 40 00 00 c0 00 00 00 02 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 02 c0 20 00 07 02 00 1f 00 00 00 00 2f 02 04 00 b0: 40 ff 10 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 06 00 32 42 00 b0 00 00 00 00

0x13 : Graphics Aperature Base 0x2c : I don't know. maybe same with vendor & device ID 0x61, 0x62 : shadow ram setting 0x72 : CPU to PCI Flow control. difference bit 7 is described as follow. 7bit Retry Status 0 No retry occurred -------- default 1 Retry occurred ----------- write 1 to clear

In my opinion, there are not special differences.

...
Eric

Linuxbios mailing list Linuxbios@clustermatic.org http://www.clustermatic.org/mailman/listinfo/linuxbios

Linuxbios mailing list Linuxbios@clustermatic.org http://www.clustermatic.org/mailman/listinfo/linuxbios

-- -------------------------steven james, director of research, linux labs ... ........ ..... .... 230 peachtree st nw ste 701 the original linux labs atlanta.ga.us 30303 -since 1995 http://www.linuxlabs.com office 404.577.7747 fax 404.577.7743

Munjun Kang

9 Dec 9 Dec

5:36 a.m.

New subject: file system error

Thanks for your concerning.

...

Greetings,

Any poassability that the amount of memory passed to the kernel is too high and the crashes are due to using non-existant memory for a buffer or struct?

...

1st test is check reported memory on boot, 2nd is memtest86, 3rd is to pass mem=(some small value) to the kernel and see what happens.

...

G'day, sjames

On Mon, 2 Dec 2002, Munjun Kang wrote:

Thanks for reading.

Malas. (Munjun Kang)

...

...
...
"Ronald G. Minnich" rminnich@lanl.gov writes:

...
any ideas?

ron

---------- Forwarded message ---------- Date: Fri, 29 Nov 2002 15:35:52 +0900 From: Munjun Kang malas@pinetron.com To: Ronald G. Minnich rminnich@lanl.gov Subject: Re: file system error

Thanks for your reply.

I tried as your suggest.

dd if=/dev/zero of=/dev/hda bs=1024 count=10000 ~ 100000 Segmentation fault's are occured in random by count.

and then, turn off the UDMA feature by hdparm option. But, I can see same symptom.

All DMA is turned off hdparm -d0 /dev/hda ??

Yes, I did it.

...
...
In this time, I tried to attach SCSI & IEEE1394 SBP-2 devices. case1. Adaptec 2930 SCSI adapter + 8GB Seagate SCSI HDD case2. IEEE1394 Interface card + external IEEE1394 HDD both cases show the same problem.

The northbridge having DMA problems is still a canidate. SCSI disks do DMA as well.

...
Now, I think it's not a DMA problem. I'm in the maze. hmmmm......

Is there any clear hint?

Past history with the Athlon problems on VIA chipsets says that some VIA northbridges have problems with burst traffic.

And either DMA or a fast memory copy could trigger it. memtest86 currently does not have an optimized memcpy so it could miss that problem.

Currently I consider your northbridge to be the best canidate. The same kernel is run under both BIOSes?

I tried in several cases.

Build-in BIOS + 2.4.18-13 (redhat 8.0) => work

Linuxbios + 2.4.18-13 (redhat 8.0) => don't work

Linuxbios + 2.4.19 => don't work

...
Compared to your previous BIOS are there any unknown settings in the northbridge?

In particular what are the differences between, on both boards, and can you account for the differences. lspci -s 0:0.0 -xxx And can you account for all differences.

In work, 00:00.0 Host bridge: VIA Technologies, Inc. VT8605 [ProSavage PM133] 00: 06 11 05 06 06 00 10 a2 00 00 00 06 00 08 00 00 10: 08 00 00[e8]00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00[06 11 05 06] 30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: fd db c8 be 05 00 08 08 c0 00 08 08 08 08 08 08 60: 03[aa]00[20]e6 d5 d5 00 43 38 86 0d 08 21 00 00 70: c4 88[cc]0c 0e 81 52 00 01 b4 09 00 00 00 00 00 80: 0f 40 00 00 c0 00 00 00 02 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 02 c0 20 00 07 02 00 1f 00 00 00 00 2f 02 04 00 b0: 40 ff 10 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 06 00 32 42 00 b0 00 00 00 00

In problem, 00:00.0 Class 0600: 1106:0605 00: 06 11 05 06 06 00 10 a2 00 00 00 06 00 08 00 00 10: 08 00 00[f8]00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00[00 00 00 00] 30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: fd db c8 be 05 00 08 08 c0 00 08 08 08 08 08 08 60: 03[00]00[00]e6 d5 d5 00 43 38 86 0d 08 21 00 00 70: c4 88[4c]0c 0e 81 52 00 01 b4 09 00 00 00 00 00 80: 0f 40 00 00 c0 00 00 00 02 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 02 c0 20 00 07 02 00 1f 00 00 00 00 2f 02 04 00 b0: 40 ff 10 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 06 00 32 42 00 b0 00 00 00 00

0x13 : Graphics Aperature Base 0x2c : I don't know. maybe same with vendor & device ID 0x61, 0x62 : shadow ram setting 0x72 : CPU to PCI Flow control. difference bit 7 is described as follow. 7bit Retry Status 0 No retry occurred -------- default 1 Retry occurred ----------- write 1 to clear

In my opinion, there are not special differences.

...
Eric

Linuxbios mailing list Linuxbios@clustermatic.org http://www.clustermatic.org/mailman/listinfo/linuxbios

Linuxbios mailing list Linuxbios@clustermatic.org http://www.clustermatic.org/mailman/listinfo/linuxbios

-- -------------------------steven james, director of research, linux labs ... ........ ..... .... 230 peachtree st nw ste 701 the original linux labs atlanta.ga.us 30303 -since 1995 http://www.linuxlabs.com office 404.577.7747 fax 404.577.7743

steven james

10:42 a.m.

New subject: file system error

Greetings,

That's looking likely, but it seems that 48M are being eaten somewhere (quite possably SMA). Unless you actually want an SMA that big, it might be good to look at the registers for SMA using lspci after a LinuxBIOS boot. You'll probably have to add some code to initialize that/those registers. It could be that SMA is taking 48M due to an undocumented or reserved value in the register.

If this is the cause, the corruption could be infrequent with video not initialized.

The test to confirm will be examining the registers with LinuxBIOS vs. OEM BIOS and setting SMA register explicitly to a smaller value and see if the mem= parameter can be larger in that case.

G'day, sjames

On Mon, 9 Dec 2002, Munjun Kang wrote:

...

Thanks for your concerning.

...
Greetings,

Any poassability that the amount of memory passed to the kernel is too high and the crashes are due to using non-existant memory for a buffer or struct?

I've installed the 128MB(in one bank) Memory. But, my Northbridge uses SMA(shared memory architecture) for interal graphics core(savage 4). For SMA, NB reserves the 8~32MB physical memory. I did test in various SMA size, 0, 8, 32 etc. In my think, this causes the problem.

...
1st test is check reported memory on boot, 2nd is memtest86, 3rd is to pass mem=(some small value) to the kernel and see what happens.

1st. memory reporting through e820 memmap is good. 2nd. memtest86 has no error, but often shows 1 error/day. 3rd. command line mem=xxm option show the many symptom. below mem=80m, it works good. but, over mem=80m to 120m , it show kernel panic after found compressed or segmentation fault in harddisk access.

...
G'day, sjames

On Mon, 2 Dec 2002, Munjun Kang wrote:

Thanks for reading.

Malas. (Munjun Kang)

...
...
...
"Ronald G. Minnich" rminnich@lanl.gov writes:

...
any ideas?

ron

---------- Forwarded message ---------- Date: Fri, 29 Nov 2002 15:35:52 +0900 From: Munjun Kang malas@pinetron.com To: Ronald G. Minnich rminnich@lanl.gov Subject: Re: file system error

Thanks for your reply.

I tried as your suggest.

dd if=/dev/zero of=/dev/hda bs=1024 count=10000 ~ 100000 Segmentation fault's are occured in random by count.

and then, turn off the UDMA feature by hdparm option. But, I can see same symptom.

All DMA is turned off hdparm -d0 /dev/hda ??

Yes, I did it.

...
...
In this time, I tried to attach SCSI & IEEE1394 SBP-2 devices. case1. Adaptec 2930 SCSI adapter + 8GB Seagate SCSI HDD case2. IEEE1394 Interface card + external IEEE1394 HDD both cases show the same problem.

The northbridge having DMA problems is still a canidate. SCSI disks do DMA as well.

...
Now, I think it's not a DMA problem. I'm in the maze. hmmmm......

Is there any clear hint?

Past history with the Athlon problems on VIA chipsets says that some VIA northbridges have problems with burst traffic.

And either DMA or a fast memory copy could trigger it. memtest86 currently does not have an optimized memcpy so it could miss that problem.

Currently I consider your northbridge to be the best canidate. The same kernel is run under both BIOSes?

I tried in several cases.

Build-in BIOS + 2.4.18-13 (redhat 8.0) => work

Linuxbios + 2.4.18-13 (redhat 8.0) => don't work

Linuxbios + 2.4.19 => don't work

...
Compared to your previous BIOS are there any unknown settings in the northbridge?

In particular what are the differences between, on both boards, and can you account for the differences. lspci -s 0:0.0 -xxx And can you account for all differences.

In work, 00:00.0 Host bridge: VIA Technologies, Inc. VT8605 [ProSavage PM133] 00: 06 11 05 06 06 00 10 a2 00 00 00 06 00 08 00 00 10: 08 00 00[e8]00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00[06 11 05 06] 30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: fd db c8 be 05 00 08 08 c0 00 08 08 08 08 08 08 60: 03[aa]00[20]e6 d5 d5 00 43 38 86 0d 08 21 00 00 70: c4 88[cc]0c 0e 81 52 00 01 b4 09 00 00 00 00 00 80: 0f 40 00 00 c0 00 00 00 02 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 02 c0 20 00 07 02 00 1f 00 00 00 00 2f 02 04 00 b0: 40 ff 10 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 06 00 32 42 00 b0 00 00 00 00

In problem, 00:00.0 Class 0600: 1106:0605 00: 06 11 05 06 06 00 10 a2 00 00 00 06 00 08 00 00 10: 08 00 00[f8]00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00[00 00 00 00] 30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: fd db c8 be 05 00 08 08 c0 00 08 08 08 08 08 08 60: 03[00]00[00]e6 d5 d5 00 43 38 86 0d 08 21 00 00 70: c4 88[4c]0c 0e 81 52 00 01 b4 09 00 00 00 00 00 80: 0f 40 00 00 c0 00 00 00 02 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 02 c0 20 00 07 02 00 1f 00 00 00 00 2f 02 04 00 b0: 40 ff 10 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 06 00 32 42 00 b0 00 00 00 00

0x13 : Graphics Aperature Base 0x2c : I don't know. maybe same with vendor & device ID 0x61, 0x62 : shadow ram setting 0x72 : CPU to PCI Flow control. difference bit 7 is described as follow. 7bit Retry Status 0 No retry occurred -------- default 1 Retry occurred ----------- write 1 to clear

In my opinion, there are not special differences.

...
Eric

Linuxbios mailing list Linuxbios@clustermatic.org http://www.clustermatic.org/mailman/listinfo/linuxbios

Linuxbios mailing list Linuxbios@clustermatic.org http://www.clustermatic.org/mailman/listinfo/linuxbios

-- -------------------------steven james, director of research, linux labs ... ........ ..... .... 230 peachtree st nw ste 701 the original linux labs atlanta.ga.us 30303 -since 1995 http://www.linuxlabs.com office 404.577.7747 fax 404.577.7743

Ronald G. Minnich

11:19 a.m.

New subject: file system error

On Mon, 9 Dec 2002, steven james wrote:

...

That's looking likely, but it seems that 48M are being eaten somewhere (quite possably SMA). Unless you actually want an SMA that big, it might be good to look at the registers for SMA using lspci after a LinuxBIOS boot. You'll probably have to add some code to initialize that/those registers. It could be that SMA is taking 48M due to an undocumented or reserved value in the register.

what's odd here is that the memory sizer functions should reduce the memory by the amount of SMA size.

Ronald G. Minnich

11:21 a.m.

New subject: file system error

ah, ok, looking at this lspci.

I don't believe there is support in linuxbios for calculcating the SMA size on this chipset. There is on e.g. sis 630, so on sis 630 sizeram() we figure out the SMA window size and reduce the ram size by that amount.

Munjun, how do we compute SMA size on this chipset? can you send me C code to do this? This is a needed upgrade to the chipset support.

ron

Ronald G. Minnich

11:23 a.m.

New subject: file system error

more questions, sorry.

This is an 8605. We need to get that code into the source tree. I assume you just assumed 8601 code would work?

I'd like to get this worked out.

ron

Kevin Hester

8:06 p.m.

New subject: Starting the real time clock on virgin systems

Hi all,

First I'd like to describe a problem I've encountered:

I have a virgin motherboard that has never been powered up before. i.e. this board was not manufactured elsewhere and a 'standard' BIOS has never been used on it.

When booting this board I discovered an interesting problem: the boot would hang when the "hwclock" tool was invoked by /etc/rcS.d/<some script that reads the rtc>.

The underlying problem is that this common linux utility is reading the RTC via the standard IO ports 70-71. Within this RTC window all of the dallas semiconductor RTC clones use a few bits in register 0x0a to enable the clock when power is down. The default values of these bits do not enable the clock - presumably to avoid draining the battery until the boards are first placed into production.

I've modified my version of linuxbios to ensure that these bits are set to enable the RTC updates. My question is, where is the best place to make this change?

1) In some non linuxbios component (i.e. some little app run at boot time)

2) In linuxbios, but restricted to my mainboard.

3) In linuxbios, but in 'common' code that applies to all intel boards.

I'm in favor of option 3, but I thought I'd ask first. I think this problem would apply to any board. The reason we haven't seen it before is that most folks are running linux bios on boards that once had a standard bios. The standard bios has already 'activated' the RTC updates.

What do you think?

Kevin

Ronald G. Minnich

8:15 p.m.

New subject: Starting the real time clock on virgin systems

On Mon, 9 Dec 2002, Kevin Hester wrote:

...

I'm in favor of option 3, but I thought I'd ask first. I think this problem would apply to any board. The reason we haven't seen it before is that most folks are running linux bios on boards that once had a standard bios. The standard bios has already 'activated' the RTC updates.

how about in a new file pc80/rtc.c. That is where legacy pc files are. In the rtc.c file you can set up the code to do this. We'll make its inclusion conditional on a config variable.

I actually did have this same problem on the l440gx, but never knew what it was ...

thanks

ron

steven james

10 Dec 10 Dec

10:19 a.m.

New subject: Starting the real time clock on virgin systems

Greetings,

It is probably a good idea to have this code in place in general (if it can avoid doing the wrong thing if the RTC is already initialized). I've seen boards where the clock reverts to the uninitialized state. This is most likely whenever the battery is replaced, or in some cases where the CMOS is cleared by jumper. In other cases, it seems to happen for no discernable reason (could be physical vibration causing intermittant contact w/ battery when being moved).

I do agree that it should be an option.

G'day, sjames

On Mon, 9 Dec 2002, Ronald G. Minnich wrote:

...

On Mon, 9 Dec 2002, Kevin Hester wrote:

...
I'm in favor of option 3, but I thought I'd ask first. I think this problem would apply to any board. The reason we haven't seen it before is that most folks are running linux bios on boards that once had a standard bios. The standard bios has already 'activated' the RTC updates.

how about in a new file pc80/rtc.c. That is where legacy pc files are. In the rtc.c file you can set up the code to do this. We'll make its inclusion conditional on a config variable.

I actually did have this same problem on the l440gx, but never knew what it was ...

thanks

ron

Linuxbios mailing list Linuxbios@clustermatic.org http://www.clustermatic.org/mailman/listinfo/linuxbios

ebiederman＠lnxi.com

10:02 p.m.

New subject: Starting the real time clock on virgin systems

Kevin Hester kevinh@ispiri.com writes:

...

Hi all,

First I'd like to describe a problem I've encountered:

I have a virgin motherboard that has never been powered up before. i.e. this board was not manufactured elsewhere and a 'standard' BIOS has never been used on it.

When booting this board I discovered an interesting problem: the boot would hang when the "hwclock" tool was invoked by /etc/rcS.d/<some script that reads the rtc>.

The underlying problem is that this common linux utility is reading the RTC via the standard IO ports 70-71. Within this RTC window all of the dallas semiconductor RTC clones use a few bits in register 0x0a to enable the clock when power is down. The default values of these bits do not enable the clock

presumably to avoid draining the battery until the boards are first placed

into production.

I've modified my version of linuxbios to ensure that these bits are set to enable the RTC updates. My question is, where is the best place to make this change?

Assuming your RTC hardware is in your southbridge: src/southbridge/<manufacturer>/<chipset>/rtc.c or something like that.

It is my experience that only for reading the real time clock is a real time clock a real time clock. The control functions which are handled rarely tend to be specific to an individual implementation. Though there are generally similarities within a family of implementations.

...

In some non linuxbios component (i.e. some little app run at boot time)

In linuxbios, but restricted to my mainboard.

In linuxbios, but in 'common' code that applies to all intel boards.

In linuxbios common code that applies to your southbridge. For now it will probably work best to have that code called from your mainboard, and others who need it can call that code as well.\

...

I'm in favor of option 3, but I thought I'd ask first. I think this problem would apply to any board. The reason we haven't seen it before is that most folks are running linux bios on boards that once had a standard bios. The standard bios has already 'activated' the RTC updates.

I suspect being the second BIOS on the boards has certainly had something to do with it. But given how much chips vary this may simply be an oddity of your particular variation of the board.

...

What do you think?

Until I see proof that this feature was in the original motorola mc146818 real time clock, and has been in all implementations there after, I don't want the code to apply all boards indescrimanently.

Eric

Kevin Hester

10:55 p.m.

New subject: Starting the real time clock on virgin systems

Hi,

Re: proof that this feature is in the mc146818

Sure, I agree - proof good. Saddly, the only datasheets I can find on the net are for newer clones on this chip. The common dallas semi part datasheet doesn't show the register usage (arrgh!). However, this via clone does show register usage (attached).

For REGISTERA it says that DV0-DV2 must have the value of 010 to turn on the oscillator. The default value of these bits are zero and the award BIOS on my board will init these bits correctly if found to be zero.

The problem occurs in the linux hwclock tool. In rtc.c it spins waiting for the second count to change. If the RTC has not had the oscillator enabled, the second count does not change.

For the patches I've submitted, I've simply called rtc_init(0) from the vt8601 southbridge init. I didn't want to change other systems right now.

Ron mentioned he had seen this exact behavior on some 440bx(?) board he was working on. If someone is out there with some different chip set, you can do an experiment:

* Shut the system down * Remove the battery * Wait a good long time * Insert the battery * Boot linuxbios & linux. If hwclock hangs, then you have this problem.

Kevin

On Tuesday 10 December 2002 18:11, Eric W. Biederman wrote:

...

Kevin Hester kevinh@ispiri.com writes:

...
Hi all,

First I'd like to describe a problem I've encountered:

I have a virgin motherboard that has never been powered up before. i.e. this board was not manufactured elsewhere and a 'standard' BIOS has never been used on it.

When booting this board I discovered an interesting problem: the boot would hang when the "hwclock" tool was invoked by /etc/rcS.d/<some script that reads the rtc>.

The underlying problem is that this common linux utility is reading the RTC via the standard IO ports 70-71. Within this RTC window all of the dallas semiconductor RTC clones use a few bits in register 0x0a to enable the clock when power is down. The default values of these bits do not enable the clock - presumably to avoid draining the battery until the boards are first placed into production.

I've modified my version of linuxbios to ensure that these bits are set to enable the RTC updates. My question is, where is the best place to make this change?

Assuming your RTC hardware is in your southbridge: src/southbridge/<manufacturer>/<chipset>/rtc.c or something like that.

It is my experience that only for reading the real time clock is a real time clock a real time clock. The control functions which are handled rarely tend to be specific to an individual implementation. Though there are generally similarities within a family of implementations.

...

In some non linuxbios component (i.e. some little app run at boot

time)

In linuxbios, but restricted to my mainboard.

In linuxbios, but in 'common' code that applies to all intel boards.

In linuxbios common code that applies to your southbridge. For now it will probably work best to have that code called from your mainboard, and others who need it can call that code as well.\

...
I'm in favor of option 3, but I thought I'd ask first. I think this problem would apply to any board. The reason we haven't seen it before is that most folks are running linux bios on boards that once had a standard bios. The standard bios has already 'activated' the RTC updates.

I suspect being the second BIOS on the boards has certainly had something to do with it. But given how much chips vary this may simply be an oddity of your particular variation of the board.

...
What do you think?

Until I see proof that this feature was in the original motorola mc146818 real time clock, and has been in all implementations there after, I don't want the code to apply all boards indescrimanently.

Eric

steven james

11:50 p.m.

New subject: Starting the real time clock on virgin systems

Greetings,

Confirmed on Intel Clearwater (e7500 chipset w/ RTC built in to NatSemi superio).

G'day, sjames

On Tue, 10 Dec 2002, Kevin Hester wrote:

...

Hi,

Re: proof that this feature is in the mc146818

Sure, I agree - proof good. Saddly, the only datasheets I can find on the net are for newer clones on this chip. The common dallas semi part datasheet doesn't show the register usage (arrgh!). However, this via clone does show register usage (attached).

For REGISTERA it says that DV0-DV2 must have the value of 010 to turn on the oscillator. The default value of these bits are zero and the award BIOS on my board will init these bits correctly if found to be zero.

The problem occurs in the linux hwclock tool. In rtc.c it spins waiting for the second count to change. If the RTC has not had the oscillator enabled, the second count does not change.

For the patches I've submitted, I've simply called rtc_init(0) from the vt8601 southbridge init. I didn't want to change other systems right now.

Ron mentioned he had seen this exact behavior on some 440bx(?) board he was working on. If someone is out there with some different chip set, you can do an experiment:

Shut the system down

Remove the battery

Wait a good long time

Insert the battery

Boot linuxbios & linux. If hwclock hangs, then you have this problem.

Kevin

On Tuesday 10 December 2002 18:11, Eric W. Biederman wrote:

...
Kevin Hester kevinh@ispiri.com writes:

...
Hi all,

First I'd like to describe a problem I've encountered:

I have a virgin motherboard that has never been powered up before. i.e. this board was not manufactured elsewhere and a 'standard' BIOS has never been used on it.

When booting this board I discovered an interesting problem: the boot would hang when the "hwclock" tool was invoked by /etc/rcS.d/<some script that reads the rtc>.

The underlying problem is that this common linux utility is reading the RTC via the standard IO ports 70-71. Within this RTC window all of the dallas semiconductor RTC clones use a few bits in register 0x0a to enable the clock when power is down. The default values of these bits do not enable the clock - presumably to avoid draining the battery until the boards are first placed into production.

I've modified my version of linuxbios to ensure that these bits are set to enable the RTC updates. My question is, where is the best place to make this change?

Assuming your RTC hardware is in your southbridge: src/southbridge/<manufacturer>/<chipset>/rtc.c or something like that.

It is my experience that only for reading the real time clock is a real time clock a real time clock. The control functions which are handled rarely tend to be specific to an individual implementation. Though there are generally similarities within a family of implementations.

...

In some non linuxbios component (i.e. some little app run at boot

time)

In linuxbios, but restricted to my mainboard.

In linuxbios, but in 'common' code that applies to all intel boards.

In linuxbios common code that applies to your southbridge. For now it will probably work best to have that code called from your mainboard, and others who need it can call that code as well.\

...
I'm in favor of option 3, but I thought I'd ask first. I think this problem would apply to any board. The reason we haven't seen it before is that most folks are running linux bios on boards that once had a standard bios. The standard bios has already 'activated' the RTC updates.

I suspect being the second BIOS on the boards has certainly had something to do with it. But given how much chips vary this may simply be an oddity of your particular variation of the board.

...
What do you think?

Until I see proof that this feature was in the original motorola mc146818 real time clock, and has been in all implementations there after, I don't want the code to apply all boards indescrimanently.

Eric

Ronald G. Minnich

11 Dec 11 Dec

1:08 a.m.

New subject: Starting the real time clock on virgin systems

so my suggestion of pc80/rtc.c is not acceptable? This seems to me to be a generic pc80 issue.

ron

steven james

2:06 a.m.

New subject: Starting the real time clock on virgin systems

Greetings,

I think pc80/rtc.c IS the correct place for it. AFAIK, it is generic.

G'day, sjames

On Tue, 10 Dec 2002, Ronald G. Minnich wrote:

...

so my suggestion of pc80/rtc.c is not acceptable? This seems to me to be a generic pc80 issue.

ron

Linuxbios mailing list Linuxbios@clustermatic.org http://www.clustermatic.org/mailman/listinfo/linuxbios

ebiederman＠lnxi.com

5:12 a.m.

New subject: Starting the real time clock on virgin systems

steven james pyro@linuxlabs.com writes:

...

Greetings,

I think pc80/rtc.c IS the correct place for it. AFAIK, it is generic.

O.k. I have looked and now that I have actually looked up what is being referred to I can confirm it is in the original motorola chip. I have DV2,DV1,DV0 the various reference clocks are all present in the reference chip. Initially when the problem was brought up I did not see what everyone was talking about.

The only problem with pc80/rtc.c is that pc80/mc14618rtc.c already exists and even has this code. It is just a mater of calling: rtc_init, on most of the boards that need it. It has a long name only because it references the original motorola part number.

And the default for the timer base is already in the existing code.

So calling rtc_init from an appropriate place appears to be all that is required. Which I do from ich2_rtc_init() and ich3_rtc_init().

We still need an implementation/southbridge specific routine to see if the battery failed but otherwise the exiting code should work pretty much as is.

I tracked down a lot of the issues when figuring out what it would take to store parameters in the cmos ram, and apparently no one else noticed.. And for me the internal state had simply swapped out.

So been there, done that. The code already exists, works and is generic. Feel free to call it from the appropriate southbridge fixup routines if you want. There is even a define so boards with strange configurations can overrule the default configuration.

Eric

Ronald G. Minnich

10:38 a.m.

New subject: Starting the real time clock on virgin systems

On 11 Dec 2002, Eric W. Biederman wrote:

...

The only problem with pc80/rtc.c is that pc80/mc14618rtc.c already exists and even has this code. It is just a mater of calling: rtc_init, on most of the boards that need it. It has a long name only because it references the original motorola part number.

OK.

...

So calling rtc_init from an appropriate place appears to be all that is required. Which I do from ich2_rtc_init() and ich3_rtc_init().

I wonder if we might not call rtc_init from hardware main based upon a config variable? CONFIG_RTC_INIT? Is there harm done if you call it but it did not need to be called? I have never looked at the RTC, as you can tell.

ron

Ronald G. Minnich

10:32 a.m.

New subject: Starting the real time clock on virgin systems

Kevin can you send me a simple piece of code to use for pc80/rtc.c that fixes the problem?

ron

Ronald G. Minnich

10:48 a.m.

New subject: Starting the real time clock on virgin systems

On Wed, 11 Dec 2002, Ronald G. Minnich wrote:

...

Kevin can you send me a simple piece of code to use for pc80/rtc.c that fixes the problem?

OK just caught up. Yup, I had not noticed the motorola code you put in Eric. Now that I am looking at it seems like that is what we need.

It's always interesting going away from the tree for a bit and coming back, all sorts of improvements crop up :-)

ron

ebiederman＠lnxi.com

3:01 p.m.

New subject: Starting the real time clock on virgin systems

"Ronald G. Minnich" rminnich@lanl.gov writes:

...

On Wed, 11 Dec 2002, Ronald G. Minnich wrote:

...
Kevin can you send me a simple piece of code to use for pc80/rtc.c that fixes the problem?

OK just caught up. Yup, I had not noticed the motorola code you put in Eric. Now that I am looking at it seems like that is what we need.

I think if you will check that code has been in the tree for a year or so. The only specific thing I remember is that it was strongly inspired by the alpha port. So I think it was done as part of the port to the tyan s2462. Which I did immediately after the alpha.

...

It's always interesting going away from the tree for a bit and coming back, all sorts of improvements crop up :-)

And sometimes things just get overlooked, but yes an open project can be interesting that way.

Eric

Kevin Hester

4:47 p.m.

New subject: Starting the real time clock on virgin systems

Yep,

When I went to go add the rtc init I discovered this. I added a call to the existing rtc_init(0) from the via southbridge setup and all was well. It seems like the consenous is to move this call into hardwaremain with some sort of option driven ifdef. True?

Kevin

On Wednesday 11 December 2002 11:08, Eric W. Biederman wrote:

...

"Ronald G. Minnich" rminnich@lanl.gov writes:

...
On Wed, 11 Dec 2002, Ronald G. Minnich wrote:

...
Kevin can you send me a simple piece of code to use for pc80/rtc.c that fixes the problem?

OK just caught up. Yup, I had not noticed the motorola code you put in Eric. Now that I am looking at it seems like that is what we need.

I think if you will check that code has been in the tree for a year or so. The only specific thing I remember is that it was strongly inspired by the alpha port. So I think it was done as part of the port to the tyan s2462. Which I did immediately after the alpha.

...
It's always interesting going away from the tree for a bit and coming back, all sorts of improvements crop up :-)

And sometimes things just get overlooked, but yes an open project can be interesting that way.

Eric

ebiederman＠lnxi.com

7:21 p.m.

New subject: Starting the real time clock on virgin systems

Kevin Hester kevinh@ispiri.com writes:

...

Yep,

When I went to go add the rtc init I discovered this. I added a call to the existing rtc_init(0) from the via southbridge setup and all was well. It seems like the consenous is to move this call into hardwaremain with some sort of option driven ifdef. True?

The consensus is that rtc_init is good. I am leary about moving the actual call into mainboard.c so boards don't even have a real time clock.

And I definitely need a wrapper for it on the boards I use. Leaving the call in mainboard.c/southbridge.c is probably good for the moment.

There is some hardware description work I was playing with, and it may be worth enhancing that architecture so that you just specify which hardware you have and how it is all hooked up.

But I certainly don't have the time to work through what that would take to have things work reliably.

Eric

ebiederman＠lnxi.com

5:01 a.m.

New subject: Starting the real time clock on virgin systems

steven james pyro@linuxlabs.com writes:

...

Greetings,

Confirmed on Intel Clearwater (e7500 chipset w/ RTC built in to NatSemi superio).

You are not using the ich3?

Eric

steven james

10:45 a.m.

New subject: Starting the real time clock on virgin systems

Greetings,

Actually, I should have said westville, but they use the portions of the Natsemi AND ich3 for some reason.

Intel designs tend to work well, but often have odd design decisions.

G'day, sjames

On 11 Dec 2002, Eric W. Biederman wrote:

...

steven james pyro@linuxlabs.com writes:

...
Greetings,

Confirmed on Intel Clearwater (e7500 chipset w/ RTC built in to NatSemi superio).

You are not using the ich3?

Eric

ebiederman＠lnxi.com

3:01 p.m.

New subject: Starting the real time clock on virgin systems

steven james pyro@linuxlabs.com writes:

...

Greetings,

Actually, I should have said westville, but they use the portions of the Natsemi AND ich3 for some reason.

Intel designs tend to work well, but often have odd design decisions.

I have strong memory of some of the failures of the odd design decisions. Intel motherboards have never inspired me as simple designs.

Eric

Ronald G. Minnich

1:07 a.m.

New subject: Starting the real time clock on virgin systems

On Tue, 10 Dec 2002, Kevin Hester wrote:

...

Boot linuxbios & linux. If hwclock hangs, then you have this problem.

verified on the L440GX and L440GX+

ron

steven james

10 Dec 10 Dec

11:47 p.m.

New subject: Starting the real time clock on virgin systems

Greetings,

if the init is what I think it is ( setting bit 5 in RTC index 0x0a, the divider), it should be the same for all RTC. The part that tends to differ is the PnP stuff. The index accesses from port 0x70-71 tend to stick to standards.

G'day, sjames

On 10 Dec 2002, Eric W. Biederman wrote:

...

Kevin Hester kevinh@ispiri.com writes:

...
Hi all,

First I'd like to describe a problem I've encountered:

I have a virgin motherboard that has never been powered up before. i.e. this board was not manufactured elsewhere and a 'standard' BIOS has never been used on it.

When booting this board I discovered an interesting problem: the boot would hang when the "hwclock" tool was invoked by /etc/rcS.d/<some script that reads the rtc>.

The underlying problem is that this common linux utility is reading the RTC via the standard IO ports 70-71. Within this RTC window all of the dallas semiconductor RTC clones use a few bits in register 0x0a to enable the clock when power is down. The default values of these bits do not enable the clock

presumably to avoid draining the battery until the boards are first placed

into production.

I've modified my version of linuxbios to ensure that these bits are set to enable the RTC updates. My question is, where is the best place to make this change?

Assuming your RTC hardware is in your southbridge: src/southbridge/<manufacturer>/<chipset>/rtc.c or something like that.

It is my experience that only for reading the real time clock is a real time clock a real time clock. The control functions which are handled rarely tend to be specific to an individual implementation. Though there are generally similarities within a family of implementations.

...

In some non linuxbios component (i.e. some little app run at boot time)

In linuxbios, but restricted to my mainboard.

In linuxbios, but in 'common' code that applies to all intel boards.

In linuxbios common code that applies to your southbridge. For now it will probably work best to have that code called from your mainboard, and others who need it can call that code as well.\

...
I'm in favor of option 3, but I thought I'd ask first. I think this problem would apply to any board. The reason we haven't seen it before is that most folks are running linux bios on boards that once had a standard bios. The standard bios has already 'activated' the RTC updates.

I suspect being the second BIOS on the boards has certainly had something to do with it. But given how much chips vary this may simply be an oddity of your particular variation of the board.

...
What do you think?

Until I see proof that this feature was in the original motorola mc146818 real time clock, and has been in all implementations there after, I don't want the code to apply all boards indescrimanently.

Eric

Linuxbios mailing list Linuxbios@clustermatic.org http://www.clustermatic.org/mailman/listinfo/linuxbios

ebiederman＠lnxi.com

11 Dec 11 Dec

4:49 a.m.

New subject: Starting the real time clock on virgin systems

steven james pyro@linuxlabs.com writes:

...

Greetings,

if the init is what I think it is ( setting bit 5 in RTC index 0x0a, the divider), it should be the same for all RTC. The part that tends to differ is the PnP stuff. The index accesses from port 0x70-71 tend to stick to standards.

Detecting battery failure also tends to vary to the extent that on some tyan motherboard via chipset systems the only way I have found so far is a cmos checksum. The cmos clear jumper is actually a cmos corrupt jumper on that board. On other boards cmos clear tends to set a magic bit.

Eric

steven james

10:41 a.m.

New subject: Starting the real time clock on virgin systems

Greetings,

I have seen that. It figures! I do note that it still needs bits 4-6 of index 0x0a set to 010b At minimum, it will do no harm to set that on the Tyan boards.

G'day, sjames

On 11 Dec 2002, Eric W. Biederman wrote:

...

steven james pyro@linuxlabs.com writes:

...
Greetings,

if the init is what I think it is ( setting bit 5 in RTC index 0x0a, the divider), it should be the same for all RTC. The part that tends to differ is the PnP stuff. The index accesses from port 0x70-71 tend to stick to standards.

Detecting battery failure also tends to vary to the extent that on some tyan motherboard via chipset systems the only way I have found so far is a cmos checksum. The cmos clear jumper is actually a cmos corrupt jumper on that board. On other boards cmos clear tends to set a magic bit.

Eric

8184

days inactive

8196

days old

coreboot@coreboot.org

35 comments

7 participants

tags (0)

participants (7)

ebiederman＠lnxi.com
Justin Cormack
Kevin Hester
Munjun Kang
Nathanael Noblet
Ronald G. Minnich
steven james