Hi,
since various automated systems started submitting board status files, the board-status repository has grown significantly to a point where submitting a new board from scratch requires a download which is bigger than the whole coreboot repository. Once I hook up four more systems in my lab, I expect this growth to accelerate quite a bit. It will get even more extreme once I start testing every commit instead of limiting the tests to one per hour.
This poses three problems:
1. Download size for people wanting to submit new logs. Can be solved with shallow git clones of the board-status repository.
2. Download size for people wanting to find out where some breakage happened. Not really solvable unless we commit less reports, and that makes bisection harder.
3. Server load for anything working with the board-status repo. This means gitweb and the wiki, and that's not a problem (yet).
Should we just ignore the size until it becomes too large, is size not a problem or should we immediately do something?
Regards, Carl-Daniel
Maybe using git a monotonically increasing entry count is the wrong way to go about it. That's beyond the scope, mission, design and goal of git.
I would rethink the entire board-status mechanism to something more suited to the task than git.
Yours truly, The Devil
On 02/17/2016 01:07 PM, Carl-Daniel Hailfinger wrote:
Hi,
since various automated systems started submitting board status files, the board-status repository has grown significantly to a point where submitting a new board from scratch requires a download which is bigger than the whole coreboot repository. Once I hook up four more systems in my lab, I expect this growth to accelerate quite a bit. It will get even more extreme once I start testing every commit instead of limiting the tests to one per hour.
This poses three problems:
- Download size for people wanting to submit new logs.
Can be solved with shallow git clones of the board-status repository.
- Download size for people wanting to find out where some breakage
happened. Not really solvable unless we commit less reports, and that makes bisection harder.
- Server load for anything working with the board-status repo.
This means gitweb and the wiki, and that's not a problem (yet).
Should we just ignore the size until it becomes too large, is size not a problem or should we immediately do something?
Regards, Carl-Daniel
It's definitely an issue, and it's actually something that is currently being looked into. As always, we just need someone to do the work. :)
There's a project listed on the project ideas wiki page about this, so maybe we'll get an enterprising person to work on it. If not, we'll probably continue with git for now, but put a web front-end on it so that the project doesn't need to be downloaded to add to the database. Here's a link to it in the wiki: https://www.coreboot.org/Project_Ideas#coreboot_mainboard_test_suite_reporti...
I think we all agree that git isn't the right way to do this, but it was infrastructure we had in place, so it was an easy way to add what was needed at the time. We're reaching a pain point on that, so it's definitely time to do SOMETHING about it.
Martin
On Wed, Feb 17, 2016 at 2:54 PM, Alex G. mr.nuke.me@gmail.com wrote:
Maybe using git a monotonically increasing entry count is the wrong way to go about it. That's beyond the scope, mission, design and goal of git.
I would rethink the entire board-status mechanism to something more suited to the task than git.
Yours truly, The Devil
On 02/17/2016 01:07 PM, Carl-Daniel Hailfinger wrote:
Hi,
since various automated systems started submitting board status files, the board-status repository has grown significantly to a point where submitting a new board from scratch requires a download which is bigger than the whole coreboot repository. Once I hook up four more systems in my lab, I expect this growth to accelerate quite a bit. It will get even more extreme once I start testing every commit instead of limiting the tests to one per hour.
This poses three problems:
- Download size for people wanting to submit new logs.
Can be solved with shallow git clones of the board-status repository.
- Download size for people wanting to find out where some breakage
happened. Not really solvable unless we commit less reports, and that makes bisection harder.
- Server load for anything working with the board-status repo.
This means gitweb and the wiki, and that's not a problem (yet).
Should we just ignore the size until it becomes too large, is size not a problem or should we immediately do something?
Regards, Carl-Daniel
-- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Hi Carl-Daniel,
since last week I've started shopping around for options with people who know a thing or two when it comes to storing and processing large amounts of data.
For the problem of submitters having to download the repo, I was thinking about setting up a relatively simple web service frontend that allows pushing files, that are then integrated into the git repo on the server side. It should be possible to implement that with little effort and no effect on other parts of the existing infrastructure. The status submission scripts could then use curl to PUT files, no download necessary, authentication through gerrit's http auth tokens (that can be verified against the actual gerrit instance easily).
Later steps would moving the data into a more suitable data store (that's what I'm currently looking into), and could be done transparently to the status submission scripts (as long as the web service's endpoint remains the same), and provide some way to build server-side and cached queries. That should also help with the bisection issue, by not downloading the entire data set in the first place. Of course, a complete download needs to be possible, it just shouldn't be necessary for every single query.
Regarding server load, gitweb/cgit only touch repos as they're accessed by users, and they scale reasonably. Since the wiki page mostly cares about the newest entry per board, the script that creates it can surely be reworked to scale primarily with the number of boards present, making the number of reports a minor factor.
Patrick