Hi everybody,
In our leadership meeting[1] we discussed how we should deal with tree-wide changes (ranging from "new file header" to "some API is gone now"). The concern was that right now, anything can change at any time, making local development harder.
There have been a few ideas but nothing definite yet:
One idea was to declare some interfaces (e.g. those used by src/mainboards/**), with varying stability horizons (e.g. "only change right after a release"), or backward compatibility guarantees (although the details still seemed hazy on how that could work in practice.)
Another idea brought up was to require that such changes come with documentation and, ideally, migration support in the form of scripts and the like. We had something like this in the past[2] and I created a proposal[3] to establish it as a rule and build a culture around documenting sweeping changes.
One doesn't exclude the other, and there may be other approaches on how to make development easier without calcifying our codebase. Or maybe people don't see a problem that needs fixing?
In any case, I wanted to bring it up in a larger forum to make sure that we find rough consensus across the community before a decision is made on how to proceed here.
Regards, Patrick
[1] minutes at https://docs.google.com/document/d/1NRXqXcLBp5pFkHiJbrLdv3Spqh1Hu086HYkKrgKj... [2] http://web.archive.org/web/20130315025026/http://www.coreboot.org/Flag_Days [3] https://review.coreboot.org/c/coreboot/+/52576
Hi,
On 21.04.21 20:33, Patrick Georgi via coreboot wrote:
Hi everybody,
In our leadership meeting[1] we discussed how we should deal with tree-wide changes (ranging from "new file header" to "some API is gone now"). The concern was that right now, anything can change at any time, making local development harder.
There have been a few ideas but nothing definite yet:
One idea was to declare some interfaces (e.g. those used by src/mainboards/**), with varying stability horizons (e.g. "only change right after a release"), or backward compatibility guarantees (although the details still seemed hazy on how that could work in practice.)
that would probably increase the burden to introduce such APIs. They must be designed much more carefully if they can't be changed at any time. That would also apply to our devicetree format which already evolves too slow, IMHO.
Another idea brought up was to require that such changes come with documentation and, ideally, migration support in the form of scripts and the like. We had something like this in the past[2] and I created a proposal[3] to establish it as a rule and build a culture around documenting sweeping changes.
I think it's nice to use a script and if one does, sharing it is obviously a good thing. But demanding that would again burden up- stream development.
One doesn't exclude the other, and there may be other approaches on how to make development easier without calcifying our codebase. Or maybe people don't see a problem that needs fixing?
I see a problem, but it's quite the opposite. IMO, upstream updates that affect all boards of a platform, for instance, are already too hard. One problem I've encountered multiple times is the lack of detailed public board documentation, e.g. schematics. Alas, for the platforms I work with, the first board ports in the tree are often those with the least public documentation. People with access to the documentation have become more helpful over the years, I think. So it's probably not a big issue anymore.
In any case, I wanted to bring it up in a larger forum to make sure that we find rough consensus across the community before a decision is made on how to proceed here.
Thanks. I think we first need a very precise description of the problem. Including some numbers (e.g. additional time spent on rebasing due to incompatible changes) to estimate how big the problem actually is. I don't see the big problem yet with the brief descriptions given.
My guess is that somebody is trying to rebase downstream work rather often. Is that the case? If so, I'd ask what the reasons for each rebase are. Obviously, when somebody develops a bunch of patches and wants to upstream them later, that's at least one rebase. Are incompatible, site- wide changes already a problem in this case? Or are we talking about much more rebasing?
Nico
[1] minutes at https://docs.google.com/document/d/1NRXqXcLBp5pFkHiJbrLdv3Spqh1Hu086HYkKrgKj... [2] http://web.archive.org/web/20130315025026/http://www.coreboot.org/Flag_Days [3] https://review.coreboot.org/c/coreboot/+/52576
Patrick Georgi via coreboot wrote:
tree-wide changes
..
there may be other approaches on how to make development easier
I'm a big fan of semantic patching as provided by coccinelle and used heavily in Linux kernel development.
Perhaps one way to make lives easier is to require tree-wide changes to be the result of an spatch, which can then be applied downstream too?
//Peter
Am Do., 22. Apr. 2021 um 22:58 Uhr schrieb Peter Stuge peter@stuge.se:
Patrick Georgi via coreboot wrote:
tree-wide changes
..
there may be other approaches on how to make development easier
I'm a big fan of semantic patching as provided by coccinelle and used heavily in Linux kernel development.
Perhaps one way to make lives easier is to require tree-wide changes to be the result of an spatch, which can then be applied downstream too?
I proposed that in https://review.coreboot.org/c/coreboot/+/52576 already because I'm also a fan of this idea.
That said, both in the meeting and in Nico's sibling comment there have been concerns on putting additional burden on developers (although, personally, I'd rather review a CL that is created by a tool + a simple rule set than a tree-wide refactoring made by hand...)
Patrick
On 4/21/21 8:33 PM, Patrick Georgi via coreboot wrote:
Hi everybody,
Hi,
In our leadership meeting[1] we discussed how we should deal with tree-wide changes (ranging from "new file header" to "some API is gone now"). The concern was that right now, anything can change at any time, making local development harder.
I already added some comment on gerrit: https://review.coreboot.org/c/coreboot/+/52576/comment/4033eba1_56d6eab5/
There have been a few ideas but nothing definite yet:
One idea was to declare some interfaces (e.g. those used by src/mainboards/**), with varying stability horizons (e.g. "only change right after a release"), or backward compatibility guarantees (although the details still seemed hazy on how that could work in practice.)
Initial point was related to long term development on fork, but based on changes proposed by Patrick I wanted to rise other concerns.
Any guarantees should have some anchor e.g. in release version. At this point we all agree that release points of coreboot are chosen arbitrarily and provide no quality or API compatibility guarantees. Despite it is clearly stated in documentation out of community not many people know that.
From embedded systems consulting perspective, and we faced applications of coreboot in e.g. trains or medical robots, long term support and some API compatibility is needed. Cost of massive rebase of patches from some old, or sometimes not so old, version may be not feasible - that's how some customers may get back to IBVs.
What worries me is that dislike of backward compatibility and easiness of throwing away "redundant" baggage of code that blocks tree-wide changes makes coreboot harder to maintain for long run in some applications.
This is one part of the problem, other is specifications compatibility where ACPI is one that breaks things often. coreboot moves with ACPI compiler faster then most of BSD systems, what lead to problems with BSD-based firewalls.
Another idea brought up was to require that such changes come with documentation and, ideally, migration support in the form of scripts and the like. We had something like this in the past[2] and I created a proposal[3] to establish it as a rule and build a culture around documenting sweeping changes.
Flag Days looks like good idea it essentially can work as guide where to look for problems, if firmware do not behave the same after set of tree-wide changes or if we have to rebase old fork.
Right now committer of tree-wide change applies proposed modification to whole code. This is done without any hardware test as well do not require all maintainers to confirm the change. I'm not sure if any of those changes required code development from maintainers. It seem that community agree to that approach or treat that as necessity. Maybe those de facto standards should be also written down?
One doesn't exclude the other, and there may be other approaches on how to make development easier without calcifying our codebase. Or maybe people don't see a problem that needs fixing?
What about announcing this changes before release and while those kind of changes are released giving option to still use older code base by config option (this may be classified as calcifying)?
If config option is to extreme then maybe one release notice period, like with platform drop, of switching API would be good enough?
In any case, I wanted to bring it up in a larger forum to make sure that we find rough consensus across the community before a decision is made on how to proceed here.
Best Regards,
Sorry for being a bit late here, but I wanted to second what Nico said. It's important to not add undue burden to the development process. I think the master branch is meant for development, not for shipping long-term stable products. If you're installing coreboot in a train or medical device, then why on Earth would you want to rebase that onto the latest master after you have stabilized? Cut yourself a private branch and cherry-pick fixes back to that as necessary. As an open source project, coreboot doesn't have anywhere near the resources to do enough QA to guarantee that the tip of the master branch (or any branch or tag, for that matter) was stable enough to be shipped in a product at any point in time... even Linux cannot do that, and they have way more resources than we do. It's always best effort, and if you want to put this on something you want to sell for money, you'll have to pick a point where you take control of it (i.e. cut a branch) and then do your own QA on that.
For the argument of supporting out-of-tree development, honestly, I think coreboot is a GPL project and out-of-tree development is opposed to the spirit of the GPL, if not the letter. The whole point of being open source and copyleft is that we can all work together on one tree and integrate with each others' changes in real time. If someone wants to develop their own patches in a secret cave for a year and then dump them all on the coreboot Gerrit all at once, and they can square that away legally somehow, fine... we can't stop them. But I think the extra friction caused by that is on them and we shouldn't make work for mainline developers harder to support that case.
For the case mentioned with ACPI compatibility, I think it's a bit different -- since coreboot versions can't be tied directly to OS versions, there's value in trying to maintain some back and forwards compatibility for the interfaces crossing that boundary, and I think we generally try to do that where we can. We can try to codify that if people want to. But I think it should be something that encourages maintaining compatibility while still allowing for flexibility where necessary... i.e. the guidelines should be written mostly with "should" and not "must".
I'm okay with maintaining a "running log of major changes" as long as it doesn't create too much of a hassle to maintain. coccinelle spatches can be encouraged where it's useful but I think they should always be optional... some migrations can be easily represented like that but others not so much. And if I'm flipping the arguments in a function that's only used 2 or 3 times in the whole tree, it's kind of overkill to write an spatch.
On 5/5/21 10:56 PM, Julius Werner wrote:
Hi Julius,
Sorry for being a bit late here, but I wanted to second what Nico said. It's important to not add undue burden to the development process. I think the master branch is meant for development, not for shipping long-term stable products. If you're installing coreboot in a train or medical device, then why on Earth would you want to rebase that onto the latest master after you have stabilized?
I didn't said anything about "latest master", this can be any particular point in time that address given need in most efficient way.
There are many reasons for rebasing or updating firmware to name few security and maintainability. Second case is interesting since, if you maintain 5 projects which are 4.12 based it is way different then maintain 4.0, 4.6, 4.9 etc.
Cut yourself a private branch and cherry-pick fixes back to that as necessary.
It is way easier to say then to do. Cherry-picking with so dramatical changes across the tree is asking for dependency hell. I will argue this is not economically feasible for most projects.
AFAIK as community we do not endorse UEFI/edk2-like development that happen couple years ago, and probably still going on, although things getting better. That development model created code base for every microarchitecture and let them live their live without any backporting. I was BIOS Engineer that time and don't think this is valid approach since that model lead to debugging and fixing the same bugs multiple times. Even if branched project developed something important or found bug it almost never land in upstream.
As an open source project, coreboot doesn't have anywhere near the resources to do enough QA to guarantee that the tip of the master branch (or any branch or tag, for that matter) was stable enough to be shipped in a product at any point in time... even Linux cannot do that, and they have way more resources than we do. It's always best effort, and if you want to put this on something you want to sell for money, you'll have to pick a point where you take control of it (i.e. cut a branch) and then do your own QA on that.
This is what we doing right now.
I wonder if community and leadership agrees with "setup your own QA" approach.
We will advocate for improved and extended QA for coreboot and any other OSF, since without that working and doing business is nightmare simply blocking growth.
3mdeb doing Linux maintenance for industrial embedded systems, so we can easily compare efforts related to coreboot and Linux maintanance. IMO Linux doing quite well and setting up stable, LTS and SLTS (10 years support) is huge win and clear understanding expansion of project to the realms where stability is key factor. Linux can be maintained way easier and came with more and more QA guarantees.
(...)
For the case mentioned with ACPI compatibility, I think it's a bit different -- since coreboot versions can't be tied directly to OS versions, there's value in trying to maintain some back and forwards compatibility for the interfaces crossing that boundary, and I think we generally try to do that where we can. We can try to codify that if people want to. But I think it should be something that encourages maintaining compatibility while still allowing for flexibility where necessary... i.e. the guidelines should be written mostly with "should" and not "must".
Agree with that point.
I'm okay with maintaining a "running log of major changes" as long as it doesn't create too much of a hassle to maintain. coccinelle spatches can be encouraged where it's useful but I think they should always be optional... some migrations can be easily represented like that but others not so much. And if I'm flipping the arguments in a function that's only used 2 or 3 times in the whole tree, it's kind of overkill to write an spatch.
+1
Best Regards,
As an open source project, coreboot doesn't have anywhere near the resources to do enough QA to guarantee that the tip of the master branch (or any branch or tag, for that matter) was stable enough to be shipped in a product at any point in time... even Linux cannot do that, and they have way more resources than we do. It's always best effort, and if you want to put this on something you want to sell for money, you'll have to pick a point where you take control of it (i.e. cut a branch) and then do your own QA on that.
This is what we doing right now.
I wonder if community and leadership agrees with "setup your own QA" approach.
We will advocate for improved and extended QA for coreboot and any other OSF, since without that working and doing business is nightmare simply blocking growth.
I mean, I don't think anyone here is going to argue against better QA, it's just hard to do in practice. This is definitely not a burden you can just push on developers -- coreboot supports hundreds of mainboards and most of us only own a handful of those. It's just not practically possible to make everyone who checks in a patch guarantee that they don't break anyone else on any board with it. We all do our best but accidents do and always will happen. The only way to get consistently more confidence in the code base is through automated systems, and those are expensive... we already have good build testing, at least, and our recent unit test efforts should also help a lot. But we don't have any real hardware testing other than those few machines 9elements sponsored which only run after the patch is merged (and which many committers don't pay attention to, I think). If you want a big lab where every patch gets tested on every mainboard, someone needs to set all that up and someone needs to pay for it. I'm actually involved in a similar thing with the trustedfirmware.org project right now who are in the process of setting up such a lab, and I'm not sure if I'm allowed to share exact numbers, but you're quickly in the range of thousands of dollars per mainboard per year just for maintaining it (to say nothing of the upfront development cost).
There are many reasons for rebasing or updating firmware to name few security and maintainability. Second case is interesting since, if you maintain 5 projects which are 4.12 based it is way different then maintain 4.0, 4.6, 4.9 etc.
3mdeb doing Linux maintenance for industrial embedded systems, so we can easily compare efforts related to coreboot and Linux maintanance. IMO Linux doing quite well and setting up stable, LTS and SLTS (10 years support) is huge win and clear understanding expansion of project to the realms where stability is key factor. Linux can be maintained way easier and came with more and more QA guarantees.
So, I actually get the feeling that what you really want is well-maintained stable/LTS branches for coreboot releases (like Linux has)? Because for security and bug fixes in a real product, always rebasing onto master is just a bad idea in general. coreboot changes all the time, features get added, changed, serialized data formats differ... you really don't want to keep pushing all those changes onto your finished product and figure out how they affect it every time. You really just want to stick with what you have and only pull in security and bug fixes as they come up, I think.
To that I would say: yeah, stable branches are great! It would be really cool if we had them! The problem is just... someone has to step up and do it. This is a volunteer project so generally things don't get done unless someone who wants them to happen takes the time and does it. Linux has stable branch maintainers who do a lot of work pulling in all security/bugfix patches and backporting them as necessary. If we want the same for coreboot, we'll need someone to step up and do that job. Maybe patch contributors can help a bit -- e.g. in Linux, submitters add a `cc: stable` or `should be backported up to 3.4` to their commit message, which then tells the stable branch maintainers to pick that up. We could probably do something similar. But we still need someone setting up and maintaining the branch first.
On 5/6/21 1:35 AM, Julius Werner wrote:
As an open source project, coreboot doesn't have anywhere near the resources to do enough QA to guarantee that the tip of the master branch (or any branch or tag, for that matter) was stable enough to be shipped in a product at any point in time... even Linux cannot do that, and they have way more resources than we do. It's always best effort, and if you want to put this on something you want to sell for money, you'll have to pick a point where you take control of it (i.e. cut a branch) and then do your own QA on that.
This is what we doing right now.
I wonder if community and leadership agrees with "setup your own QA" approach.
We will advocate for improved and extended QA for coreboot and any other OSF, since without that working and doing business is nightmare simply blocking growth.
I mean, I don't think anyone here is going to argue against better QA, it's just hard to do in practice. This is definitely not a burden you can just push on developers -- coreboot supports hundreds of mainboards and most of us only own a handful of those. It's just not practically possible to make everyone who checks in a patch guarantee that they don't break anyone else on any board with it. We all do our best but accidents do and always will happen. The only way to get consistently more confidence in the code base is through automated systems, and those are expensive... we already have good build testing, at least, and our recent unit test efforts should also help a lot. But we don't have any real hardware testing other than those few machines 9elements sponsored which only run after the patch is merged (and which many committers don't pay attention to, I think). If you want a big lab where every patch gets tested on every mainboard, someone needs to set all that up and someone needs to pay for it. I'm actually involved in a similar thing with the trustedfirmware.org project right now who are in the process of setting up such a lab, and I'm not sure if I'm allowed to share exact numbers, but you're quickly in the range of thousands of dollars per mainboard per year just for maintaining it (to say nothing of the upfront development cost).
We are not in favor of centralization. We would advertise decentralized approach with known interface to which every company lab can connect. I do not recall you discussing contest and testing systems during OSFC, but it seem you may have lot of important insights.
If 3mdeb maintains some boards, we already testing those and would be glad to hook, in secure way, to patch testing system, but I would like to know where is interface documentation so I can evaluate cost of integration and convince customers to go that path. This was expressed many times in various communication channels (conferences, slack).
There are many reasons for rebasing or updating firmware to name few security and maintainability. Second case is interesting since, if you maintain 5 projects which are 4.12 based it is way different then maintain 4.0, 4.6, 4.9 etc.
3mdeb doing Linux maintenance for industrial embedded systems, so we can easily compare efforts related to coreboot and Linux maintanance. IMO Linux doing quite well and setting up stable, LTS and SLTS (10 years support) is huge win and clear understanding expansion of project to the realms where stability is key factor. Linux can be maintained way easier and came with more and more QA guarantees.
So, I actually get the feeling that what you really want is well-maintained stable/LTS branches for coreboot releases (like Linux has)?
I think it can be expensive to go all-in in that direction, but if we could go in that direction it would be great.
Because for security and bug fixes in a real product, always rebasing onto master is just a bad idea in general. coreboot changes all the time, features get added, changed, serialized data formats differ... you really don't want to keep pushing all those changes onto your finished product and figure out how they affect it every time. You really just want to stick with what you have and only pull in security and bug fixes as they come up, I think.
As I said this is old-time UEFI firmware development approach with forks. Personally I disagree, because it seem to be approach for quickly dying products <5 years, not something what should last for 10+ years or more (AMD CPU from PC Engines was introduced in 2014 and EOL is 2024, but product will last longer). That's why we test things and detect regressions in user visible scenarios. Please note that we release PC Engines firmware every month for almost 4 years, so it is definitely possible.
Question is why coreboot change so dramatically in all mentioned areas? Does projects with similar lifetime also change in so significant way?
To that I would say: yeah, stable branches are great! It would be really cool if we had them! The problem is just... someone has to step up and do it. This is a volunteer project so generally things don't get done unless someone who wants them to happen takes the time and does it. Linux has stable branch maintainers who do a lot of work pulling in all security/bugfix patches and backporting them as necessary. If we want the same for coreboot, we'll need someone to step up and do that job. Maybe patch contributors can help a bit -- e.g. in Linux, submitters add a `cc: stable` or `should be backported up to 3.4` to their commit message, which then tells the stable branch maintainers to pick that up. We could probably do something similar. But we still need someone setting up and maintaining the branch first.
I think Linux model is not bad and seem to work for many projects. My point is not sudden, another dramatic switch to new way, my point is to consider that problem and discuss what small steps we can take to improve things. Of course if we agree this is correct direction and important problem to solve.
P.S. We doing vPub tomorrow, https://twitter.com/3mdeb_com/status/1387017457118875651 Feel free to join and discuss that topic live.
Best Regards,
Am Do., 6. Mai 2021 um 14:03 Uhr schrieb Piotr Król piotr.krol@3mdeb.com:
If 3mdeb maintains some boards, we already testing those and would be glad to hook, in secure way, to patch testing system, but I would like to know where is interface documentation so I can evaluate cost of integration and convince customers to go that path. This was expressed many times in various communication channels (conferences, slack).
We're at the ~5th or so public test infrastructure project by now and it's still not nailed down. Part of it is that it's simply a hard problem.
Another problem is that whoever pulls this off needs to be in the very narrow intersection of having time (i.e. not a product driven coreboot developer) and having money and a few other resources (i.e. not a hobbyist), so they can run enough of the infrastructure by themselves that others who could hook into the infrastructure see the benefit.
I think it can be expensive to go all-in in that direction, but if we
could go in that direction it would be great.
If you want to maintain any particular release as a long term branch, announce your intent and we'll set up a branch!
Question is why coreboot change so dramatically in all mentioned areas? Does projects with similar lifetime also change in so significant way?
One reason is that we're dealing with the guts of an industry that is changing around us _very_ quickly. But I disagree that we're changing dramatically: on the contrary, we're pretty careful about remaining compatible in various important ways for long stretches of time to help everybody move at their own pace.
Some examples off the top of my head: - We used to compile out strings for log levels we didn't print for space reasons. Space is now no concern and there's the cbmem console, so we leave everything in for better debugging. The remnants of the "compile out" approach, gone for ~10 years, have only been removed within the last two months. - We used to have CBFS with a master header that defined "off limit" regions at the start and end of flash. That's fine as long as you don't regularly write to flash (where you risk blasting away parts of the CBFS structure), but these days we do write to flash, sadly, so there's now a partitioning scheme (fmap), making the master header obsolete. The header is still around, SeaBIOS still can't read FMAP. - We added per-file metadata to CBFS in a compatible way even though the structure is a bit more complicated than it could have been if we hadn't cared about compatibility. - The "read/write registers" code had to change because C compilers like to tighten up their rules around aliasing and volatile types and stuff like that. So we rewrite our macros into functions, with proper types, just so that a newer compiler doesn't break our entire code base. - All that vboot/mboot/bootguard security stuff just was not a thing when coreboot started. It brought in tons of complexity: more flash partitioning, more boot stages, just "tons more code", more memory management (for example, we now have some funky "free last two malloced objects" free() implementation. We got by without free() for 15 years) - Thunderbolt (and USB4) have some pretty arcane requirements on configuration buses. Originally LinuxBoot was supposed to set up only the bare minimum to jump into a kernel. With TBT/USB4 you can forget about that. - More and more external complexity brought in: IOMMUs seem to add a new data structure with every chip generation, ACPI is getting ever more complex (and we can't opt out of that madness or OSes won't boot), ...
So everything changes around us, sometimes in unexpected ways: compilers, interfaces, hardware. It would be a miracle if we didn't have to change to go along with that.
The main reason why you notice that with coreboot but not other firmware is that ancient-UEFI never gets uprevved (and while other firmware, like u-boot, collects firmware build support in their main repo, in products they're still used mostly in a copy&forget model). I just turned down a couple of older (non-coreboot) firmware build processes because they rely on python 2. They're dead, while coreboot isn't.
The main reason why it's slightly less painful with Linux (but ask any out-of-tree module maintainer!) is that chip vendors provide open source code for Linux and maintain it whenever they raise the complexity bar a notch or two, and compiler vendors (gcc, clang) are coordinating against Linux (while they usually don't really care about coreboot).
tl;dr: All things considered, we're a pretty small project punching _way_ above our weight, working sometimes against the interests of other parties in the ecosystem who'd prefer to keep things closed that we open. Since (at this time) we can't offload the pain to those who inflict it on us the way Linux is doing, we'll have to bear it.
Regards, Patrick
On 5/6/21 2:43 PM, Patrick Georgi wrote:
Am Do., 6. Mai 2021 um 14:03 Uhr schrieb Piotr Król <piotr.krol@3mdeb.com mailto:piotr.krol@3mdeb.com>:
If 3mdeb maintains some boards, we already testing those and would be glad to hook, in secure way, to patch testing system, but I would like to know where is interface documentation so I can evaluate cost of integration and convince customers to go that path. This was expressed many times in various communication channels (conferences, slack).
We're at the ~5th or so public test infrastructure project by now and it's still not nailed down. Part of it is that it's simply a hard problem.
Wow, I'm surprised this is 5th time.
I agree problem is hard, but not impossible to solve. During my corporate career I was involved couple times in building validation infrastructure for storage controllers validation (including UEFI apps and Option ROMs) as well as UEFI/PI implementations for Intel servers. With past experience we built 3mdeb validation infrastructure, which we will move forward.
Another problem is that whoever pulls this off needs to be in the very narrow intersection of having time (i.e. not a product driven coreboot developer) and having money and a few other resources (i.e. not a hobbyist), so they can run enough of the infrastructure by themselves that others who could hook into the infrastructure see the benefit.
I'm not sure to which group 3mdeb fall, but I don't understand argument about running significant ("enough") of the infrastructure. Why maintainers of platforms do not run their part of infrastructure which support those platforms?
I think it can be expensive to go all-in in that direction, but if we could go in that direction it would be great.
If you want to maintain any particular release as a long term branch, announce your intent and we'll set up a branch!
I'm not sure what benefit it would give to community or to us, but we have to maintain v4.0.x probably until PC Engines hardware EOL.
Question is why coreboot change so dramatically in all mentioned areas? Does projects with similar lifetime also change in so significant way?
One reason is that we're dealing with the guts of an industry that is changing around us _very_ quickly.
I think other projects like hypervisors, Linux or highly hardware related one feel the same.
But I disagree that we're changing dramatically: on the contrary, we're pretty careful about remaining compatible in various important ways for long stretches of time to help everybody move at their own pace.
Some examples off the top of my head:
- We used to compile out strings for log levels we didn't print for
space reasons. Space is now no concern and there's the cbmem console, so we leave everything in for better debugging. The remnants of the "compile out" approach, gone for ~10 years, have only been removed within the last two months.
- We used to have CBFS with a master header that defined "off limit"
regions at the start and end of flash. That's fine as long as you don't regularly write to flash (where you risk blasting away parts of the CBFS structure), but these days we do write to flash, sadly, so there's now a partitioning scheme (fmap), making the master header obsolete. The header is still around, SeaBIOS still can't read FMAP.
- We added per-file metadata to CBFS in a compatible way even though the
structure is a bit more complicated than it could have been if we hadn't cared about compatibility.
- The "read/write registers" code had to change because C compilers like
to tighten up their rules around aliasing and volatile types and stuff like that. So we rewrite our macros into functions, with proper types, just so that a newer compiler doesn't break our entire code base.
- All that vboot/mboot/bootguard security stuff just was not a thing
when coreboot started. It brought in tons of complexity: more flash partitioning, more boot stages, just "tons more code", more memory management (for example, we now have some funky "free last two malloced objects" free() implementation. We got by without free() for 15 years)
- Thunderbolt (and USB4) have some pretty arcane requirements on
configuration buses. Originally LinuxBoot was supposed to set up only the bare minimum to jump into a kernel. With TBT/USB4 you can forget about that.
- More and more external complexity brought in: IOMMUs seem to add a new
data structure with every chip generation, ACPI is getting ever more complex (and we can't opt out of that madness or OSes won't boot), ...
Ok, agree using drama here is exaggeration, but this is just reflection on feeling we get when we suddenly get 20 emails coming yelling at us that platform would be removed from tree and we should go into panic mode to prevent that. Or on the other side complains that upstream behaves different then our releases.
I know that changes coming to tree have important reason and I'm not arguing there should be no tree-wide changes. What I really try is to highlight various problems 3mdeb see over 6 years of coreboot development.
So everything changes around us, sometimes in unexpected ways: compilers, interfaces, hardware. It would be a miracle if we didn't have to change to go along with that.
Toolchain stability and reproducibility is something I discussed. I even started to write something here: https://docs.dasharo.com/osf-trolling-list/build_process/
The main reason why you notice that with coreboot but not other firmware is that ancient-UEFI never gets uprevved (and while other firmware, like u-boot, collects firmware build support in their main repo, in products they're still used mostly in a copy&forget model). I just turned down a couple of older (non-coreboot) firmware build processes because they rely on python 2. They're dead, while coreboot isn't.
Yeah, our coreboot-sdk has problem having only python3, SeaBIOS do not understand that: https://github.com/coreboot/seabios/blob/master/Makefile#L25
At least this was problem on 4.13 tag.
The main reason why it's slightly less painful with Linux (but ask any out-of-tree module maintainer!) is that chip vendors provide open source code for Linux and maintain it whenever they raise the complexity bar a notch or two, and compiler vendors (gcc, clang) are coordinating against Linux (while they usually don't really care about coreboot).
tl;dr: All things considered, we're a pretty small project punching _way_ above our weight, working sometimes against the interests of other parties in the ecosystem who'd prefer to keep things closed that we open. Since (at this time) we can't offload the pain to those who inflict it on us the way Linux is doing, we'll have to bear it.
This last comment touch very important thing IMO. It is 10k vs small player case, if we could fill the gap between by growing the ecosystem we would be in way better position, and I believe there is a lot of place to grow considering UEFI market. Question is how has motivation to fill the gap.
Best Regards,
Piotr Król wrote:
I don't understand argument about running significant ("enough") of the infrastructure. Why maintainers of platforms do not run their part of infrastructure which support those platforms?
I think this is a key point. It's a lot easier to develop centralized solutions, so that's a common mistake. And even if not strictly or intentionally centralized then there's often at least a knowledge gap, case in point your request for more integration information.
If you want to maintain any particular release as a long term branch, announce your intent and we'll set up a branch!
I'm not sure what benefit it would give to community or to us, but we have to maintain v4.0.x probably until PC Engines hardware EOL.
I think it would be fantastic if that happened on coreboot.org!
Maybe it would not benefit you in any way immediately, but it would probably not be a disadvantage for you either, and it would for sure look great for the project as a whole.
And maybe, just maybe, some colleagues will pitch in because they see value in what is then in practice a stable branch.
What I really try is to highlight various problems 3mdeb see over 6 years of coreboot development.
That's super valuable and I'm much appreciate your input!
Toolchain stability and reproducibility is something I discussed. I even started to write something here: https://docs.dasharo.com/osf-trolling-list/build_process/
Yes. Docker is a dark pattern, yet like `curl | bash` it prevails.
Yeah, our coreboot-sdk has problem having only python3, SeaBIOS do not understand that: https://github.com/coreboot/seabios/blob/master/Makefile#L25
At least this was problem on 4.13 tag.
I appreciate that SeaBIOS likes to still support python2, but when all distributions choose to end support for python2 it'll take more effort in SeaBIOS. It could be simple though, maybe the attached patch is already enough? It looks like all python scripts are called through $(PYTHON) rather than executed directly, so the #! command doesn't actually matter so much.
//Peter
Hi Piotr,
it feels like we are discussing the wrong things here. I've also looked at the Gerrit discussion[1]. We are discussing solutions, while the root cause is not understood yet.
On 06.05.21 00:13, Piotr Król wrote:
There are many reasons for rebasing or updating firmware to name few security and maintainability. Second case is interesting since, if you maintain 5 projects which are 4.12 based it is way different then maintain 4.0, 4.6, 4.9 etc.
Rebasing for security seems odd. Usually one has to re-evaluate security completely after a rebase. In my experience, security features randomly break upstream like everything else. There is no stability guarantee.
Rebasing for maintainability is very vague. Throughout this discussion, it seemed that rebasing itself is the maintainability issue?
I've seen some company size arguments on Gerrit. From my perspective, it doesn't get much smaller than the firmware department I work for. That's filled by about 40% of my time and beside some help from students nobody else. In this situation it turns out that the only strategy that scales is to upstream as much as possible.
So, what do we do: Of course, we can't upstream every last bit. So we maintain local branches per product. On top of upstream, that usually contains the patches to add a new board which we try to upstream (I'm much behind lately, I have to admit) and maybe 10~20 commits that are too specific to upstream. Most common cause of a rebase is that we need support for a new platform. If the last board was upstreamed in the meantime, that only leaves these 10~20 commits to rebase. And these usually are local enough (e.g. our own board ports, utils) to not conflict with any tree-wide changes.
There's a wrinkle: To upstream as much as possible, this often includes changes that affect all boards of a new platform. That's why I'm arguing against making such changes harder. My humble theory: Upstreaming is hard, hence we maintain more downstream patches, hence it's harder to rebase. If we now make upstreaming harder to ease the latter, we'll find ourselves in a vicious circle.
You mentioned 4k LOC of downstream patches on Gerrit. Maybe we should try to figure out case-by-case what led to keeping them downstream? Maybe we can find upstream solutions for some of them?
Nico
[1] https://review.coreboot.org/c/coreboot/+/52576/1/Documentation/getting_start...
Nico Huber wrote:
Another idea brought up was to require that such changes come with documentation and, ideally, migration support in the form of scripts and the like. We had something like this in the past[2] and I created a proposal[3] to establish it as a rule and build a culture around documenting sweeping changes.
I think it's nice to use a script and if one does, sharing it is obviously a good thing. But demanding that would again burden up- stream development.
Do you place spatch in this category?
I mean, do you see it as too burdensome to mandate that changes affecting the tree more than some TBD threshold are to be generated by an spatch which must also be contributed?
Clearly it is one step more to create that spatch instead of just changing all the files, but in my experience it pays off very quickly; it will never miss any files and it pays forward, helping others in the community who share the goal but can't yet push/publish.
Not only does coccinelle understand the C language but it's indeed built for the very purpose of refactoring C code across large codebases. I admit that I am biased though; I find semantic patching quite fun. :)
Werner mentioned that it might take longer to write an spatch than change something in two boards. If we do decide to try to reduce efforts through spatches then we could start out by having a high-ish threshold for what changes must include an spatch and also set a sensible date for review of outcomes of the experiment?
I think we first need a very precise description of the problem.
I think that's fair, although I can imagine upstream changes to cause problems when working on a board for a platform that's newish with platform support upstream moving around.
My guess is that somebody is trying to rebase downstream work rather often. Is that the case? If so, I'd ask what the reasons for each rebase are.
If so, I guess to not stray too far from master.
Nico Huber wrote:
I've seen some company size arguments on Gerrit. From my perspective, it doesn't get much smaller than the firmware department I work for.
Thanks for sharing your situation!
Are your boards usually on the newest platforms, or more often on more mature platforms?
There's a wrinkle: To upstream as much as possible, this often includes changes that affect all boards of a new platform. That's why I'm arguing against making such changes harder.
So I have to ask again, do you really mean that requiring spatches is "making such changes harder" ?
You mentioned 4k LOC of downstream patches on Gerrit. Maybe we should try to figure out case-by-case what led to keeping them downstream? Maybe we can find upstream solutions for some of them?
Certainly a good idea!
Thanks
//Peter
On 5/6/21 11:34 PM, Nico Huber wrote:
Hi Piotr,
Hi Nico,
it feels like we are discussing the wrong things here. I've also looked at the Gerrit discussion[1]. We are discussing solutions, while the root cause is not understood yet.
Of course it was not my intention to push any solution. What I tried to do was to bring 3mdeb perspective related to gerrit documentation proposal and discuss how problems we see can be addressed.
On 06.05.21 00:13, Piotr Król wrote:
There are many reasons for rebasing or updating firmware to name few security and maintainability. Second case is interesting since, if you maintain 5 projects which are 4.12 based it is way different then maintain 4.0, 4.6, 4.9 etc.
Rebasing for security seems odd. Usually one has to re-evaluate security completely after a rebase. In my experience, security features randomly break upstream like everything else. There is no stability guarantee.
Maybe it is odd, but backporting Intel Boot Guard or vboot to old branch and supporting it there seem to be equally odd. I also had in mind security bug fixes, which also may be not easy to back port in light of missive tree changes and lack of QA to confirm things works in the same way. Of course in security bug fixes would be way easier to backport then features.
Rebasing for maintainability is very vague. Throughout this discussion, it seemed that rebasing itself is the maintainability issue?
Question I was answering was from Julius, "why on Earth would you want to rebase that onto the latest master after you have stabilized?"
Typically we do not rebase on master but on tags, but it doesn't matter since "There is no stability guarantee".
Suggestion was, that we should do branches, what means we end up with 4.0.x, 4.6.x (where x is our patch on top of tag) etc. and backport whatever needed. My point was that maintainability cost is way lower when you have 5 projects on 4.12 instead spread through tags, since backporting is less expensive. What of course means you have to rebase older code to some "stable" point (e.g. 4.12) - it doesn't have to be upstream head. Of course this lead to LTS again.
I've seen some company size arguments on Gerrit. From my perspective, it doesn't get much smaller than the firmware department I work for. That's filled by about 40% of my time and beside some help from students nobody else. In this situation it turns out that the only strategy that scales is to upstream as much as possible.
I already addressed that in gerrit discussion. I agree with that approach.
IMO feedback we gathering in this discussion could land in documentation to provide information how development should be handled in various situation we try to handle. At least we would not waste time for this kind of discussion, but will point to documentation where maintenance best practices would be described.
So, what do we do: Of course, we can't upstream every last bit. So we maintain local branches per product. On top of upstream, that usually contains the patches to add a new board which we try to upstream (I'm much behind lately, I have to admit) and maybe 10~20 commits that are too specific to upstream. Most common cause of a rebase is that we need support for a new platform. If the last board was upstreamed in the meantime, that only leaves these 10~20 commits to rebase. And these usually are local enough (e.g. our own board ports, utils) to not conflict with any tree-wide changes.
There's a wrinkle: To upstream as much as possible, this often includes changes that affect all boards of a new platform. That's why I'm arguing against making such changes harder. My humble theory: Upstreaming is hard, hence we maintain more downstream patches, hence it's harder to rebase. If we now make upstreaming harder to ease the latter, we'll find ourselves in a vicious circle.
Clearly the perception here is that my emails try to convince to make upstreaming harder. I'm not trying to do that. When I started discussion I mentioned this is little bit off-topic to tree-wide changes, but I was asked to bring my comments from gerrit here.
Concluding I'm promoting distributed QA and stable development point with some quality guarantees in form of QA report. I know there is not enough resources for that, but decided that discussion about tree-wide changes is good point to slowly start moving that forward.
You mentioned 4k LOC of downstream patches on Gerrit. Maybe we should try to figure out case-by-case what led to keeping them downstream?
I enumerated that on gerrit, those are some examples. We can move discussion here, if you want. I'm not claiming those are unupstreamable patches - maybe we didn't tried hard enough or simply didn't have enough resources for that.
Maybe we can find upstream solutions for some of them?
I'm pretty sure we can, but I believe despite that some points from discussion will be still valid.
Best Regards,
Rebasing for security seems odd. Usually one has to re-evaluate security completely after a rebase. In my experience, security features randomly break upstream like everything else. There is no stability guarantee.
Maybe it is odd, but backporting Intel Boot Guard or vboot to old branch and supporting it there seem to be equally odd. I also had in mind security bug fixes, which also may be not easy to back port in light of missive tree changes and lack of QA to confirm things works in the same way. Of course in security bug fixes would be way easier to backport then features.
Well, okay, I don't think that's what anyone here meant when we said "backport security fixes". I meant actual bug fixes, like there was a missed size check leading to a potential buffer overflow somewhere -- that's something that you can relatively easily backport most of the time. And for that, maintaining stable branches so not everybody has to do the tracking and backporting on their own would be great (if someone has the time to do it).
But of course you can't backport things like vboot or BootGuard support to an older branch -- those are huge features that dig deep into coreboot internals in many places. Those features are exactly the kind of things that require these tree-wide API changes that this discussion started about. So... I'm not really sure what you want here, tbh. If you want to get all these big new features on your board, then you should forward-port your out-of-tree patches to a newer release, and you'll have to deal with the problems caused by all the big API changes. If you don't want to deal with big API changes, then you should keep your stuff on a stabilization branch and only backport specific bug fixes that you need -- in that case, you'll of course not get any big new features. I understand that you might like to have both but I think that's just fundamentally impossible -- big new features just tend to require deep, invasive changes.
Do you place spatch in this category?
I mean, do you see it as too burdensome to mandate that changes affecting the tree more than some TBD threshold are to be generated by an spatch which must also be contributed?
I think we could encourage that, I don't think it's really something you can make a hard requirement. spatches just don't work well for all kinds of API changes. Starting this as a sort of "experiment" like you suggested to see how it goes sounds like a good idea.
Am Sa., 8. Mai 2021 um 03:08 Uhr schrieb Julius Werner <jwerner@chromium.org
:
I understand that you might like to have both [features and stability] but I think that's just fundamentally impossible -- big new features just tend to require deep, invasive changes.
+1
I think we could encourage that, I don't think it's really something you can make a hard requirement. spatches just don't work well for all kinds of API changes. Starting this as a sort of "experiment" like you suggested to see how it goes sounds like a good idea.
This is why my proposed documentation change ( https://review.coreboot.org/c/coreboot/+/52576/1/Documentation/getting_start...) states: "Providing a script or a [coccinelle]( https://coccinelle.gitlabpages.inria.fr/website/) semantic patch to automate this step is extra helpful, so consider doing that if possible."
The only "shall do" request is that there's _some_ documentation about what has been going on, and that can be as short as "commit abc replaced foo(a,b) with bar(b,a)."
Do I need to emphasize the "if possible" part some more?
Patrick
Dear Piotr,
Thank you for bringing up these issues.
Am 28.04.21 um 00:38 schrieb Piotr Król:
On 4/21/21 8:33 PM, Patrick Georgi via coreboot wrote:
In our leadership meeting[1] we discussed how we should deal with tree-wide changes (ranging from "new file header" to "some API is gone now"). The concern was that right now, anything can change at any time, making local development harder.
I already added some comment on gerrit: https://review.coreboot.org/c/coreboot/+/52576/comment/4033eba1_56d6eab5/
There have been a few ideas but nothing definite yet:
One idea was to declare some interfaces (e.g. those used by src/mainboards/**), with varying stability horizons (e.g. "only change right after a release"), or backward compatibility guarantees (although the details still seemed hazy on how that could work in practice.)
Initial point was related to long term development on fork, but based on changes proposed by Patrick I wanted to rise other concerns.
Any guarantees should have some anchor e.g. in release version. At this point we all agree that release points of coreboot are chosen arbitrarily and provide no quality or API compatibility guarantees. Despite it is clearly stated in documentation out of community not many people know that.
From embedded systems consulting perspective, and we faced applications of coreboot in e.g. trains or medical robots, long term support and some API compatibility is needed. Cost of massive rebase of patches from some old, or sometimes not so old, version may be not feasible - that's how some customers may get back to IBVs.
What worries me is that dislike of backward compatibility and easiness of throwing away "redundant" baggage of code that blocks tree-wide changes makes coreboot harder to maintain for long run in some applications.
This is one part of the problem, other is specifications compatibility where ACPI is one that breaks things often. coreboot moves with ACPI compiler faster then most of BSD systems, what lead to problems with BSD-based firewalls.
Please excuse my ignorance. I am still not fully understanding the problem, so it’d be great if more concrete examples could be given. For example, what ACPI change caused an OS problem?
I would have thought, that the “payload interface” and coreboot tables are the main problems.
Kind regards,
Paul
On 5/8/21 9:24 AM, Paul Menzel wrote:
Dear Piotr,
Hi Paul,
Thank you for bringing up these issues.
Am 28.04.21 um 00:38 schrieb Piotr Król:
(...)
This is one part of the problem, other is specifications compatibility where ACPI is one that breaks things often. coreboot moves with ACPI compiler faster then most of BSD systems, what lead to problems with BSD-based firewalls.
Please excuse my ignorance. I am still not fully understanding the problem, so it’d be great if more concrete examples could be given. For example, what ACPI change caused an OS problem?
1. ACPI CPU https://review.coreboot.org/c/coreboot/+/36258 here we tried to reduce problem https://review.coreboot.org/c/coreboot/+/36258 Above patch introduce compliance with most recent version of ACPI spec breaking in the same way BSD (FreeBSD < 12.2 and < 13, most of downstream firewall distros which are based on older FreeBSD).
2. ACPI hostbridge https://review.coreboot.org/c/coreboot/+/36318 further reverted https://review.coreboot.org/c/coreboot/+/37710 we also tried to fix problem according to spec https://review.coreboot.org/c/coreboot/+/37835 but since PCS This caused PCI detection issues.
Those are examples of problems we faced with ACPI.
I would have thought, that the “payload interface” and coreboot tables are the main problems.
Agree.
Best Regards,
Dear Piotr,
Am 10.05.21 um 10:51 schrieb Piotr Król:
On 5/8/21 9:24 AM, Paul Menzel wrote:
Am 28.04.21 um 00:38 schrieb Piotr Król:
(...)
This is one part of the problem, other is specifications compatibility where ACPI is one that breaks things often. coreboot moves with ACPI compiler faster then most of BSD systems, what lead to problems with BSD-based firewalls.
Please excuse my ignorance. I am still not fully understanding the problem, so it’d be great if more concrete examples could be given. For example, what ACPI change caused an OS problem?
- ACPI CPU
https://review.coreboot.org/c/coreboot/+/36258 here we tried to reduce problem https://review.coreboot.org/c/coreboot/+/36258
That is the same URL. I wasn’t able to find the ID or the commit hash in any commit messages in the master branch, and there is no comment in CB:36258 referencing the fixup.
Above patch introduce compliance with most recent version of ACPI spec breaking in the same way BSD (FreeBSD < 12.2 and < 13, most of downstream firewall distros which are based on older FreeBSD).
- ACPI hostbridge
https://review.coreboot.org/c/coreboot/+/36318 further reverted https://review.coreboot.org/c/coreboot/+/37710 we also tried to fix problem according to spec https://review.coreboot.org/c/coreboot/+/37835 but since PCS This caused PCI detection issues.
Those are examples of problems we faced with ACPI.
I would have thought, that the “payload interface” and coreboot tables are the main problems.
Agree.
Kind regards,
Paul
On 5/10/21 1:45 PM, Paul Menzel wrote:
Dear Piotr,
Am 10.05.21 um 10:51 schrieb Piotr Król:
On 5/8/21 9:24 AM, Paul Menzel wrote:
Am 28.04.21 um 00:38 schrieb Piotr Król:
(...)
This is one part of the problem, other is specifications compatibility where ACPI is one that breaks things often. coreboot moves with ACPI compiler faster then most of BSD systems, what lead to problems with BSD-based firewalls.
Please excuse my ignorance. I am still not fully understanding the problem, so it’d be great if more concrete examples could be given. For example, what ACPI change caused an OS problem?
- ACPI CPU
https://review.coreboot.org/c/coreboot/+/36258 here we tried to reduce problem https://review.coreboot.org/c/coreboot/+/36258
That is the same URL. I wasn’t able to find the ID or the commit hash in any commit messages in the master branch, and there is no comment in CB:36258 referencing the fixup.
Sorry, I meant this: https://review.coreboot.org/c/coreboot/+/39698
Above patch introduce compliance with most recent version of ACPI spec breaking in the same way BSD (FreeBSD < 12.2 and < 13, most of downstream firewall distros which are based on older FreeBSD).
- ACPI hostbridge
https://review.coreboot.org/c/coreboot/+/36318 further reverted https://review.coreboot.org/c/coreboot/+/37710 we also tried to fix problem according to spec https://review.coreboot.org/c/coreboot/+/37835 but since PCS This caused PCI detection issues.
Those are examples of problems we faced with ACPI.
I would have thought, that the “payload interface” and coreboot tables are the main problems.
Agree.
Kind regards,
Paul
Dear Piotr,
Am 10.05.21 um 13:51 schrieb Piotr Król:
On 5/10/21 1:45 PM, Paul Menzel wrote:
Am 10.05.21 um 10:51 schrieb Piotr Król:
On 5/8/21 9:24 AM, Paul Menzel wrote:
Am 28.04.21 um 00:38 schrieb Piotr Król:
(...)
This is one part of the problem, other is specifications compatibility where ACPI is one that breaks things often. coreboot moves with ACPI compiler faster then most of BSD systems, what lead to problems with BSD-based firewalls.
Please excuse my ignorance. I am still not fully understanding the problem, so it’d be great if more concrete examples could be given. For example, what ACPI change caused an OS problem?
- ACPI CPU
https://review.coreboot.org/c/coreboot/+/36258 here we tried to reduce problem https://review.coreboot.org/c/coreboot/+/36258
That is the same URL. I wasn’t able to find the ID or the commit hash in any commit messages in the master branch, and there is no comment in CB:36258 referencing the fixup.
Sorry, I meant this: https://review.coreboot.org/c/coreboot/+/39698
Thank you.
Above patch introduce compliance with most recent version of ACPI spec breaking in the same way BSD (FreeBSD < 12.2 and < 13, most of downstream firewall distros which are based on older FreeBSD).
- ACPI hostbridge
https://review.coreboot.org/c/coreboot/+/36318 further reverted https://review.coreboot.org/c/coreboot/+/37710 we also tried to fix problem according to spec https://review.coreboot.org/c/coreboot/+/37835 but since PCS This caused PCI detection issues.
Just to be sure, with your improvements, these issues were fixed in a backward compatible way, right?
Those are examples of problems we faced with ACPI.
To me it looks like, these tree wide ACPI changes just missed some corner cases, and would have been caught with run-time testing on more boards? It’s good to point it out though, as ACPI – as the name suggests – is an interface to “the OS”.
I would have thought, that the “payload interface” and coreboot tables are the main problems.
Agree.
Sorry, one more question. Could you please elaborate, why you are locked with v4.0.x and v4.6.0 as commented in [1]?
Kind regards,
Paul
[1]: https://review.coreboot.org/c/coreboot/+/52576/1/Documentation/getting_start...