Disagree, but also what do you classify as local storage? Does the repo “size” include all projects or just one? What about multiple branches? How much capacity is local storage?
A stock Unreal Engine project is several hundred gigs, consists of multiple solutions, multiple languages, and I would classify as large personally.
Without some kind of indexing it’s very awkward to work with and very slow. To work with LLMs and Unreal projects we create a local index, that index file alone is 46GB.
Without distributed compilers and caches it can take multiple hours to compile the main solution per platform (usually PC, Linux, Xbox, PlayStation, Switch, and sometimes mobile).
So the codebase easily fits on local storage so long as you don’t count assets (those are several TB) and extra so for source assets (10s of TB), and that’s per stream per large project.
Anyways, point is I disagree and think Unreal Engine is an example of large codebase that fits locally.
If your codebase can’t fit on a single developer dev machine it’s too big.
You mean like Teslas multi terabyte repo is not normal?
I think it's obvious that multi terabyte repos are not the norm.
How did they even manage to generate a terabyte sized repo, that's crazy. Do they have something written up on how it's structured and why they'd even go that route?
A terabyte is ~220 thousand books (1000 pages, 50 rows, 100 columns) uncompressed. VCS generally store objects compressed.
It couldn’t be broken in to domain specific components?
Listen, I am a rails developer, so a monolith doesn’t scare me, and yet, there are limits. Why does it need to be a multi terabyte monolith?
Even a large AAA game should be able to be cloned to a machine. You don't need to clone history, just use --depth to specify the number of commits you want.
Obviously a 20 year git repo with all commits is going to be massive, but you don't need that locally.
Also, it seems like it would be reasonable for a AAA game to version control assets separately from code.
That probably mostly assets, no?
Probably, but you want to version control assets too.
People usually mention git-lfs at this point, but that is always annoying to use in practice. There is also shallow-clones and sparse-checkouts, but these only mitigate the problem as there is no way around cloning at least one revision completely with git.
My last project was about 400Gb, and probably 2M lines of C++. The days size is mostly assets but there’s still a lot of code.
If you can't clone it it's not a repo