AMD CAR quiz question

List overview All Threads
Download

newer

older

[commit] r5615 - in trunk/src:...

Re: [coreboot] Mobo Support

Rudolf Marek

6 Jun 2010 6 Jun '10

5:01 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Hello,

It makes me wonder why CAR on APs use same stack? How does this can work? I thought CPUs somehow keep caches coherent between them. I see that Fam10h CAR code allocates 1KB for each AP. But not pre Fam10h.

How this can work?

Rationale for question is to have some kind of mutex for serial console printouts and for Network over console. Secret plan is to print outputs from different CPUs on different UDP port ;)

Second reason is that we really need some inter CPU mutex for PCI access.

In principle is correct that all CPUs once after CAR stage share the cache contents?

Thanks, Rudolf

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwLuDwACgkQ3J9wPJqZRNU/cwCcChsbMNe7bxKp03IofGnNvU/g YiEAn1z1+Du/GgB/i5+4tBq2ItJ0ylyr =ArrJ -----END PGP SIGNATURE-----

Show replies by thread

Stefan Reinauer

6 Jun 6 Jun

5:28 p.m.

On 6/6/10 5:01 PM, Rudolf Marek wrote:

...

Hello,

It makes me wonder why CAR on APs use same stack?

Maybe it is not coherent? Or maybe it doesn't really work?

...

I see that Fam10h CAR code allocates 1KB for each AP. But not pre Fam10h.

How this can work?

Rationale for question is to have some kind of mutex for serial console printouts and for Network over console. Secret plan is to print outputs from different CPUs on different UDP port ;)

Second reason is that we really need some inter CPU mutex for PCI access.

Or we need to stop doing PCI accesses and console output in SMP before we have RAM.

I somehow can't imagine that the speed improvements we get from doing concurrent PCI config space accesses to some 100 registers really make up for the trouble we get ourselves into by trying to wildly spread configuration tasks among CPUs (and doing it differently on basically every mainboard that uses K8 or K10)

Stefan

Rudolf Marek

5:42 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

I think it is mostly because there is memory init done by APs. Is this true for some board?

Thanks, Rudolf

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwLwfcACgkQ3J9wPJqZRNVUuwCeOiO0GH+ARFtCSvPIz/zsX51k vtgAoKHoxvbHTFTOpNQ86NEVZFPu5IY7 =KLfr -----END PGP SIGNATURE-----

Stefan Reinauer

6:08 p.m.

On 6/6/10 5:42 PM, Rudolf Marek wrote:

...

I think it is mostly because there is memory init done by APs. Is this true for some board?

Afaik it's "ECC clearing" which is implemented several times in the tree, including stage2. It needs no PCI access nor console output, though... and parallelizing the burden of PCI config space writes is not where the speedup lives.

Stefan

Rudolf Marek

6:29 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

...

Afaik it's "ECC clearing" which is implemented several times in the tree, including stage2.

Nope, the APs can init the memory controller too. Check CONFIG_MEM_TRAIN_SEQ 0 for BSP only 1 = train_ram_on_node is called from init_cpus 2 = dunno - looks like it is also done in parallel but I could not find how it works.

Lot of boards sets it up for 2 a think only one to 1

Thanks, Rudolf

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwLzM0ACgkQ3J9wPJqZRNXnJACggCA6DuDnIA8XKPgjeBFS/YEm PQwAoJOBWtko+KzopAQLiPvstoFqBiiO =l48p -----END PGP SIGNATURE-----

Stefan Reinauer

8:56 p.m.

On 6/6/10 6:29 PM, Rudolf Marek wrote:

...

...
Afaik it's "ECC clearing" which is implemented several times in the tree, including stage2.

Nope, the APs can init the memory controller too. Check CONFIG_MEM_TRAIN_SEQ 0 for BSP only 1 = train_ram_on_node is called from init_cpus 2 = dunno - looks like it is also done in parallel but I could not find how it works.

Lot of boards sets it up for 2 a think only one to 1

I think 2 is for calling it from CAR..

Still wondering how much time we save by parallelizing this... Did anyone take a measurement?

ron minnich

7 Jun 7 Jun

2:45 a.m.

I've talked to Marc Jones about this several times over the years.. He can confirm my memory. There is almost no win to parallelizing any of the memory or PCI bus setup. Yes, it's supported in the code, kind of, for some platforms, and maybe it works on some of them, but it's not worth it and it really complicates things.

What is worth it, and we've measured this, is ECC scrubbing. We should focus on that.

So the boot path: BSP does all device tree, DRAM setup, sets up stacks and boot code for APs

APs are woken up and do what they are told, which is in many cases to set themselves up and do ECC scrubbing.

In other words, Stefan is right (again :-)

ron

5074

days inactive

5075

days old

coreboot@coreboot.org

6 comments

3 participants

tags (0)

participants (3)

ron minnich
Rudolf Marek
Stefan Reinauer