This free tradition in software is I think one of the things that I love so much, but I don't see how it can continue with LLMs due to the extremely high training costs and the powerful hardware required for inference. It just seems like writing software will necessarily require paying rent to the LLM hosts to keep up. I guess it's possible that we'll figure out a way to do local inference in a way that is accessible to everyone in the way that most other modern software tools are, but the high training costs make that seem unlikely to me.
I also worry that as we rely on LLMs more and more, we will stop producing the kind of tutorials and other content aimed at beginners that makes it so easy to pick up programming the manual way.
There's a Stephen Boyd quote that's something like "if your optimization problem is too computationally expensive, just go on vacation to Greece for a few weeks and by the time you get back, computers might be fast enough to solve it." With LLMs there's sort of an equivalent situation with cost: how mindblowing would it be able to train this kind of LLM at all even just 4 years ago? And today you can get a kindergartener level chat model for about $100. Not hard to imagine the same model costing $10 of compute in a few years.
There's also a reasonable way to "leapfrog" the training cost with a pre-trained model. So if you were doing nanochat as a learning exercise and had no money, the idea would be to code it up, run one or two very slow gradient descent iterations on your slow machine to make sure it is working, then download a pre-trained version from someone who could spare the compute.
> today you can get a kindergartener level chat model for about $100. Not hard to imagine the same model costing $10 of compute in a few years.
No, it's extremely hard to imagine since I used one of Karpathy's own models to have a basic chat bot like six years ago. Yes, it spoke nonsense; so did my GPT-2 fine tune four years ago and so does this.
And so does ChatGPT
Improvement is linear at best. I still think it's actually a log curve and GPT3 was the peak of the "fun" part of the curve. The only evidence I've seen otherwise is bullshit benchmarks, "agents" that increase performance 2x by increasing token usage 100x, and excited salesmen proclaiming the imminence of AGI
Apparently 800 million weekly users are finding ChatGPT useful in its present state.
1. According to who? Open AI?
2. Its current state is "basically free and containing no ads". I don't think this will remain true given that, as far as I know, the product is very much not making money.
Yes, that number is according to OpenAI. They released that 800m number at DevDay last week.
The most recent leaked annualized revenue rate was $12bn/year. They're spending a lot more than that but convincing customers to hand over $12bn is still a very strong indicator of demand. https://www.theinformation.com/articles/openai-hits-12-billi...
This. It looks like one of the keys to maintaining open source is to ensure OSS developers have access to capable models. In the best of worlds, LLM vendors would recognize that open source software is the commons that feeds their models and ensure it flourishes.
In the real world...
(This is a bit ranty, but due to a sincere desire for a better world, and being the recipient of personal attacks for believing a better world is achievable by a different path to others)
I feel like this point of view is an ideal not shared by one of the main branches of anti-AI sentiment.
The idea of intellectual property works against this. Rather than contributing to humanity directly, ownership of information is accumulated by individuals and then rented to humanity.
At the same time I agree that people should be able to have a livelihood that affords them the ability to create new intellectual contributions.
The service Karpathy is providing is also being provided by thousands of YouTube creators in a huge variety of topics. It's a little sad that so many must support their efforts with support their efforts with sponsorships from sources with varying degrees of ethical behaviour. Patreon is better but still not ideal. I sincerely believe this _is_ one of the best ways to contribute to society.
A recent Daily Show had Jon Stewart describe training AI as strip mining human knowledge. Training AI is regularly described as theft as if this position is a given without any counter argument possible. It is opinion masquerading as fact. This saddens me because it suggests to me that the war to control the narrative is being won by people who want to entrench a hypercapitalistic vision of ownership where not only is a particular expression of an idea ownable but also stakes a claim to own some of any ideas that come from viewing that expression.
I cannot see any way that this viewpoint would aid humanity as a whole, but instead assign benefits to a collection of individuals. The ability to trade intellectual property means that ownership inevitably gets passed to a smaller and smaller pool of individuals over time.
I think we really do need a new way to consider these issues in light of the modern world. When mentioning these thoughts to others a common refrain is that it doesn't matter because the powers that be (and their lobbyists) will prevent any fix from happening. I have never been fond of that particular fatalism, especially when it inhibits discussion of what would be better.
Awesome approach.
I'm all for abolishing IP if all AIs are owned communally. I.e. ideally they're utilities or flat out co-ops like some Spanish businesses.
https://en.wikipedia.org/wiki/Mondragon_Corporation
Consum (Spanish supermarket).
They don't get to use everything communally and then capitalism their way forward.
I recommend his ANN/LLM from scratch videos to people a lot because not only is he a clear instructor, but his code tends to be very Pythonic and just the right balance of terse but readable (not counting the Pytorch vectorization stuff, but that's not his fault, it's just complex). So I think people benefit just from watching and imitating his code style.
Then a single person whose learned those skills decide to poison all of us thanks to the skills acquired.
strong +1 - developers like him are heros
While documenting a build path is nice, IMHO renting hardware nobody can afford from VC-backed cloud providers using cold hard cash to produce clones of legacy tech using toy datasets under the guise of education is propping up the AI bubble and primarily helping institutional shareholders in those AI bubble companies, particularly their hardware supplier NVidia. Personally I do not see this as helping people or humanity.
This would sit better with me if the repo included a first tier use case for local execution, non-NVidia hardware reference, etc.
"This would sit better with me if the repo included a first tier use case for local execution, non-NVidia hardware reference, etc."
This is a pretty disheartening way to respond to something like this. Someone puts a great deal of effort into giving something interesting away for free, and is told "you should have also done THIS work for free as well in order for me to value your contribution".
It is an objective and transparent response based on free software world norms. Feel free to interpret differently and to be disheartened. Hell, many of us are disheartened by the AI VC political theater we are seeing right now: experienced programmers, artists, lawyers, perhaps much of humanity. Let's stick to objective elements of the discussion, not emotional opine.
If you can't afford $100 or learn how to train it locally with more time and less money, then this isn't something you should be focusing on at all.
It is amusing to note the dichotomy between the clearly compassionate, empathetic and altruistic perspective displayed here and the comically overstated framing of helping humanity.
(Shrug) Other sites beckon.
I think you got your proportions slightly wrong there. This will be contributing as much to an AI bubble as a kid tinkering around with combustion is contribution to global warming.
Not really. Anything that guy does sets the tone for an extended cacophony of fans and followers. It would be a sad day when nobody critically assesses the motivations, effects and framing of those moves. I question the claim this move helps humanity and stand by the assessment it's just more feeding an unfree ecosystem which equates to propping up the bubble.
As noble as the goal sounds, I think it's wrong.
Software is just a tool. Much like a hammer, a knife, or ammonium nitrate, it can be used for both good or bad.
I say this as someone who has spent almost 15 years writing software in my free time and publishing it as open source: building software and allowing anyone to use it does not automatically make other people's lives better.
A lot of my work has been used for bad purposes or what some people would consider bad purposes - cheating on tests, cheating in games, accessing personal information without permission, and in one case my work contributed to someone's doxxing. That's because as soon as you publish it, you lose control over it.
But at least with open source software, every person can use it to the same extent so if the majority of people are good, the result is likely to be more positive than negative.
With what is called AI today, only the largest corporations can afford to train the models which means they are controlled by people who have entirely different incentives from the general working population and many of whom have quite obvious antisocial personality traits.
At least 2 billion people live in dictatorships. AI has the potential to become a tool of mass surveillance and total oppression from which those countries will never recover because just like the models can detect a woman is pregnant before she knows it, it will detect a dissenter long before dissent turns into resistance.
I don't have high hopes for AI to be a force for good and teaching people how toy models work, as fun as it is, is not gonna change it.
"With what is called AI today, only the largest corporations can afford to train the models"
I take it you're very positive about Andrej's new project which allows anyone to train a model for a few hundred dollars which is comparable to the state-of-the-art from just 5 years ago then.
I would genuinely love to think otherwise. But I've seen and grown up seeing good things being used in stupid ways (not necessarily for malice)
> At least 2 billion people live in dictatorships. AI has the potential to become a tool of mass surveillance and total oppression from which those countries will never recover because just like the models can detect a woman is pregnant before she knows it, it will detect a dissenter long before dissent turns into resistance.
It already works like this in your precious western democracies and they didn't need AI to be authoritarian total surveillance states in spirit, with quite a lot of support from a propagandized populace that begged for or pretended to agree with the infringement of their civil rights because of terrorism, drugs, covid or protecting the poor poor children.
You can combat tech with legislation and culture but the legislation and culture were way beyond the tech in being extremely authoritian in the first place.
I‘m afraid the technology will do more damage because many people will abuse it for fake news and misinformation.
Yeah it feels similar to inventing the nuke. Or it’s even more insidious because the harmful effects of the tech are not nearly as obvious or immediate as the good effects, so less restraint is applied. But also, similar to the nuke, once the knowledge on how to do it is out there, someone’s going to use it, which obligates everyone else to use it to keep up.
I would adjust your formula to the:
number of people you help x how much you help them x number of people you harm x how much you harm them
For example - harming a little bit all content creators of the world, by stealing their work without compensation or permission. How much does that cost globally every year after year? How do we even quantify long term consequences of that? Stuff like that.