Hello.
This patch works for me but needs a small function calc_id_buffer for each board/cpu/whatever. I only made one for amd quadcore because it is what I have. My board does not get to ramstage, so it might not work there. It works for my serial console but should work for net or usb if I'm not mistaken. Also my tree is not up to date with svn head. Likely things could be placed elsewhere or done better, so it is mostly an idea.
But I'm unlikely to have more time for it and it works for me so far, so I'm sending it so that it does not get lost and in case somebody likes the idea and wants to commit or do it better.
Signed-off-by: Xavi Drudis Ferran xdrudis@tinet.cat
My problem was with having serial output like
[...] init node: 00 cores: 03 Start other core - nodeid: 00 cores: 03 start_other_cores() retuPPPrOOOnSSSeTTTd::: P000OxxxST3:330 00 0
x3 7cc cosootrrraeeexxxr:::t e d-- --a---p-- {{a{ p iAAAPPPcIIIiCCdCI:IID DD === 000231 NNNOOODDDEEEIIIDDD === 000000 CCCOOORRREEEIIIDDD === 000312}}} --------- [...]
So I changed it because I like it better like this:
[...] ���������������������������������������������������������������� [...]
which is ugly, slow, etc. but I can pipe it to a little script which turns it back into almost the original ("boooh!!!" :( ) or writes it a little nicer ( :) )
[...] core0:init node: 00 cores: 03 core0:Start other core - nodeid: 00 cores: 03 core0:start_other_cores() returned core1:POST: core2:POST: core3:POST: core0:POST: core1: 0x30 core2: 0x30 core3: 0x30 core0:0 core0:x37 core2: c core3: c core0:started ap apicid: core1:corex: --- { APICID = 01 NODEID = 00 COREID = 01} --- core2:orex: --- { APICID = 02 NODEID = 00 COREID = 02} --- core3:orex: --- { APICID = 03 NODEID = 00 COREID = 03} --- [...]
or even ( :) )
[...] init node: 00 cores: 03| | | | | | Start other core - nodei| | | d: 00 cores: 03 | | | start_other_cores() retu|P |P |P rned |OST: |OST: |OST: POST: | 0x30 | 0x30 | 0x30 0 | | | | | | x37 | | c | c started ap apicid: |corex: --- { APICID = 0|orex: --- { APICID = 02|orex: --- { APICID = 03 |1 NODEID = 00 COREID = 0| NODEID = 00 COREID = 0 | NODEID = 00 COREID = 03 |1} --- |2} --- |} --- | | | [...]
or other options to if you fancy.
I observed characters from each core got mixed but bits from each character not, so the superIO or someone probably did alreday synchronize bytes correctly. So I could split each byte in two, add a 4 bit buffer id and pretend each core has a different serial port and I'm multiplexing up to 16 channels for up to 16 cores into one physical serial port, by sending together the channel/buffer id with the character (4 bits buffer id, 4 bits half character). Then a small perl script demultiplexes each channel into a different buffer and formats output. If someone has more than 16 cores does she really want to see ouput from all at a time? Redefining the weak function calc_id_buffer you can choose to have some of them mix into the same buffer or just filter out the output for some of them.
Hope the patch is simple enough, else I can explain more later. I must leave now.
On Sat, Jan 29, 2011 at 11:09:05AM +0100, xdrudis wrote:
is what I have. My board does not get to ramstage, so it might not work there. It works for my serial console but should work for net or
Ok, now I see it. It won't work with sprintf in ramstage.
I shouldn't have modified vtxprintf but maybe console.c, either calling calc_id_buffer for each byte in __console_tx_byte (too expensive ?) or passing around id_buffer by more extensive modification (like changing tx_byte signature to add a channel parameter or something, maybe not worth it)
Forget it and sorry for the noise.
Am Sonntag, 30. Januar 2011, um 15:46:01 schrieb xdrudis:
Forget it and sorry for the noise.
No need to be sorry, the idea is sound.
Patrick
* xdrudis xdrudis@tinet.cat [110130 15:46]:
On Sat, Jan 29, 2011 at 11:09:05AM +0100, xdrudis wrote:
is what I have. My board does not get to ramstage, so it might not work there. It works for my serial console but should work for net or
Ok, now I see it. It won't work with sprintf in ramstage.
It should not be needed in ram stage, as we have locking there.
Stefan
On Sun, Jan 30, 2011 at 08:32:58PM +0100, Stefan Reinauer wrote:
- xdrudis xdrudis@tinet.cat [110130 15:46]:
On Sat, Jan 29, 2011 at 11:09:05AM +0100, xdrudis wrote:
is what I have. My board does not get to ramstage, so it might not work there. It works for my serial console but should work for net or
Ok, now I see it. It won't work with sprintf in ramstage.
It should not be needed in ram stage, as we have locking there.
Yes, it'd be mostly unneeded, but anyway the patch I sent does not disable it in ramstage. So it still causes sprintf to consume the double of bytes maybe beyond its buffers (and produce unreadable messages). The idea can work, the perl script may be usable, but the patches to C should be redone from scratch.
In fact I think there's some subtle breakage already in romstage.c, since I didn't realise console_tx will check for \n to add \r and that won't work with CONFIG_DEBUX_MUXED in that patch, although it won't be noticed unless you have 16 cores (I think an \n from the 16th core would produce a \n).
But continuing with the idea, the perl script might be changed to detect some string in the input and stop demultiplexing. Sonner or later coreboot may stop multiplexing or simply stop running and have some other software writing to serial port, and it'd be nice if the script would then simply copy the input unchanged (to all outputs if it's not writing to stdout only ?).
* xdrudis xdrudis@tinet.cat [110130 20:59]:
Yes, it'd be mostly unneeded, but anyway the patch I sent does not disable it in ramstage. So it still causes sprintf to consume the double of bytes maybe beyond its buffers (and produce unreadable messages). The idea can work, the perl script may be usable, but the patches to C should be redone from scratch.
Can't you just hack up the muxing in console_tx_byte instead? That should solvle the problem and make the patch a tad less intrusive.
Stefan
On Sun, Jan 30, 2011 at 09:07:52PM +0100, Stefan Reinauer wrote:
- xdrudis xdrudis@tinet.cat [110130 20:59]:
Yes, it'd be mostly unneeded, but anyway the patch I sent does not disable it in ramstage. So it still causes sprintf to consume the double of bytes maybe beyond its buffers (and produce unreadable messages). The idea can work, the perl script may be usable, but the patches to C should be redone from scratch.
Can't you just hack up the muxing in console_tx_byte instead? That should solvle the problem and make the patch a tad less intrusive.
yes, console_tx_byte or __console_tx_byte .
The only drawback is that then I have to call calc_id_buffer for each byte, I guess it doesn't matter, but it was less overhead calling it once per call to vtxprintf (in fact there was another call in number(...) but still fewer than one per byte.
But I just got past fidvid now, and I should clean up the mess of code I've been changing all these months and see what exactly fixed it, or if I can have a decent patch. Not sure it'll be much help for other CPU revisions beyond C3, though.
On Sun, Jan 30, 2011 at 12:32 PM, Stefan Reinauer stefan.reinauer@coreboot.org wrote:
- xdrudis xdrudis@tinet.cat [110130 15:46]:
On Sat, Jan 29, 2011 at 11:09:05AM +0100, xdrudis wrote:
is what I have. My board does not get to ramstage, so it might not work there. It works for my serial console but should work for net or
Ok, now I see it. It won't work with sprintf in ramstage.
It should not be needed in ram stage, as we have locking there.
I think that the locking can be added via the BSPs cache. All multicore should use CAR and it is a matter of adding it where it won't get stepped on by the normal use of CAR. For AMD fam10, the sysinfo setup would need to be fixed up.
Marc
On Sun, Jan 30, 2011 at 01:05:54PM -0700, Marc Jones wrote:
I think that the locking can be added via the BSPs cache. All multicore should use CAR and it is a matter of adding it where it won't get stepped on by the normal use of CAR. For AMD fam10, the sysinfo setup would need to be fixed up.
Marc
That is a little over my head, I believe. I thought in general the cache was a different memory for each core (let alone each node). My phenom has L3 cache common to all cores but I thought romstage is not using it.
And I guess the functionality is slightly different :
with muxing you get the node number in the output so you can format things in columns, etc. And it is not dependent on the architecture (except to find out the core number to use as buffer id). With locking you just separate output but don't know which core it comes from (but it becomes feasible to include the core number in the messages where appropiate, and you don't need demuxing, get less serial traffic...).
locking in romstage might be useful for other things beyond console output, though.
So both approaches have advantages. I'm just not sure how to implement locking. That's the main reason I didn't, even if some comment suggested it somewhere. Secondary excuses were that it seemed less alteration of the runtime behaviour not to lock, and that I could also set it up to filter out some output.
On Sat, Jan 29, 2011 at 11:09:05AM +0100, xdrudis wrote:
formats output. If someone has more than 16 cores does she really want to see ouput from all at a time? Redefining the weak function calc_id_buffer you can choose to have some of them mix into the same buffer or just filter out the output for some of them.
Also wrong. Having several concurrent outputs with the same buffer id is not safe, demuxing could join the wrong half-characters.
Filtering out some cores would be an option.
Am 30.01.2011 16:04, schrieb xdrudis:
Also wrong. Having several concurrent outputs with the same buffer id is not safe, demuxing could join the wrong half-characters.
Filtering out some cores would be an option.
Or just use more bits for cores once the need arises. That will give you up to 128 cores. That should be enough for a while.
Patrick