drivebyhooting
a month ago
In my opinion LLMs are intellectual property theft. Just as if I started distributing copies of books. This substantially reduces the incentive for the creation of new IP.
All written text, art work, etc needs to come imbued with a GPL style license: if you train your model on this, your weights and training code must be published.
theropost
a month ago
I think there is a real issue here, but I do not think it is as simple as calling it theft in the same way as copying books. The bigger problem is incentives. We built a system where writing docs, tutorials, and open technical content paid off indirectly through traffic, subscriptions, or services. LLMs get a lot of value from that work, but they also break the loop that used to send value back to the people and companies who created it.
The Tailwind CSS situation is a good example. They built something genuinely useful, adoption exploded, and in the past that would have meant more traffic, more visibility, and more revenue. Now the usage still explodes, but the traffic disappears because people get answers directly from LLMs. The value is clearly there, but the money never reaches the source. That is less a moral problem and more an economic one.
Ideas like GPL-style licensing point at the right tension, but they are hard to apply after the fact. These models were built during a massive spending phase, financed by huge amounts of capital and debt, and they are not even profitable yet. Figuring out royalties on top of that, while the infrastructure is already in place and rolling out at scale, is extremely hard.
That is why this feels like a much bigger governance problem. We have a system that clearly creates value, but no longer distributes it in a sustainable way. I am not sure our policies or institutions are ready to catch up to that reality yet.
Brybry
a month ago
> We have a system that clearly creates value, but no longer distributes it in a sustainable way
The same thing happened (and is still happening) with news media and aggregation/embedding like Google News or Facebook.
I don't know if anyone has found a working solution yet. There have been some laws passed and licensing deals [1]. But they don't really seem to be working out [2].
[1] https://www.cjr.org/the_media_today/canada_australia_platfor...
[2] https://www.abc.net.au/news/2025-04-02/media-bargaining-code...
ViscountPenguin
a month ago
I'm not sure that I'd call [2] it not working out, just like I wouldn't call the equivalent pressure from the USA to dismantle medicare our public health system not working out.
The biggest issue with the scheme is the fact that it was structured to explicitly favour media incumbents, and is therefore politically unpopular.
w10-1
a month ago
> I do not think it is as simple as calling it theft in the same way as copying books
Aside from the incentive problem, there is a kind of theft, known as conversion: when you were granted a license under some conditions, and you went beyond them - you kept the car past your rental date, etc. In this case, the documentation is for people to read; AI using it to answer questions is a kind of conversion (no, not fair use). But these license limits are mostly implicit in the assumption that (only) people are reading, or buried in unenforceable site terms of use. So it's a squishy kind of stealing after breaching a squishy kind of contract - too fuzzy to stop incented parties.
jefftk
a month ago
Why do you think there's was an implicit agreement that documentation was only intended for humans? I've written a lot of documentation, much of it open source, and I'm generally very excited that it has proved additionally useful via LLMs. If you had asked me in 2010 whether that was something I intended in writing docs I'm pretty sure I would have said something like "that's science fiction, but sure".
b112
a month ago
You still intended it for humans. Intent is defined by what one is aiming for, and without knowledge of an alternative, that was your intent.
100% I get that you are OK with it being used by non-human ingestion. And I think many might be OK with that.
One thing, I'm not sure how helpful the documentation is. I think we're getting training out of example, not docs. This makes me think... we could test this by creating a new pseudo-language, and then provide no examples, only docs.
If the LLM can then code effectively after reading the docs, we'd have a successful test. Otherwise? It's all parroting.
johnpaulkiser
a month ago
There will be no royalties, simply make all the models that trained on the public internet also be required to be public.
This won't help tailwind in this case, but it'll change the answer to "Should I publish this thing free online?" from "No, because a few AI companies are going to exclusively benefit from it" to "Yes, I want to contribute to the corpus of human knowledge."
amrocha
a month ago
Contributing to human knowledge doesn’t pay the bills though
imiric
a month ago
It can. The problem is the practice of using open source as a marketing funnel.
There are many projects that love to brag about being open source (it's "free"!), only to lock useful features behind a paywall, or do the inevitable license rug pull after other companies start profiting from the freedoms they've provided them. This is the same tactic used by drug dealers to get you hooked on the product.
Instead, the primary incentive to release a project as open source should be the desire to contribute to the corpus of human knowledge. That doesn't mean that you have to abandon any business model around the project, but that shouldn't be your main goal. There are many successful companies built around OSS that balance this correctly.
"AI" tools and services corrupt this intention. They leech off the public good will, and concentrate the data under the control of a single company. This forces well-intentioned actors to abandon open source, since instead of contributing to human knowledge, their work contributes to "AI" companies. I'm frankly not upset when this affects projects who were abusing open source to begin with.
So GP has a point. Forcing "AI" tools, and even more crucially, the data they collect and use, to be free/libre, would restore the incentive for people to want to provide a public good.
The narrative that "AI" will bring world prosperity is a fantasy promoted by the people who will profit the most. The opposite is true: it will concentrate wealth and power in the hands of a few even more than it is today. It will corrupt the last vestiges of digital freedoms we still enjoy today.
I hope we can pass regulation that prevents this from happening, but I'm not holding my breath. These people are already in power, and governments are increasingly in symbiotic relationships with them.
catlifeonmars
a month ago
> The narrative that "AI" will bring world prosperity is a fantasy promoted by the people who will profit the most. The opposite is true: it will concentrate wealth and power in the hands of a few even more than it is today. It will corrupt the last vestiges of digital freedoms we still enjoy today.
This is on point.
AniseAbyss
a month ago
[dead]
delusional
a month ago
> We have a system that clearly creates value, but no longer distributes it in a sustainable way.
It does not "create value" it harvests value and redirects the proceeds it accrues towards its owners. The business model is a middleman that arbitrages the content by separating it from the delivery.
Software licensing has been broken for 2 decades. That's why free software isn't financially viable for anybody except a tiny minority. It should be. The entire industry has been operating by charity. The rich mega corporations have decided they're not longer going to be charitable.
sodapopcan
a month ago
It's not as simple as calling it theft, but it is simply theft, plus the other good points you made.
visarga
a month ago
Copying is theft, generating is theft, and it is not even taking anything they had. Future revenue can't be stolen.
I think once it becomes infrastructure and widely used knowledge the authors can't claim control anymore. Or shouldn't.
sodapopcan
a month ago
> Future revenue can't be stolen.
This is a big eye-roll but otherwise ya, this is one way to think of it. It's not all about money, though. The people running these companies are just taking, en masse, without credit. This is a basic human desire. Of course there is a discussion of whether or not we should evolve beyond that. It feels incredibly dystopian to me, though.
pico303
a month ago
The problem is there was a social contract. Someone spent their time and money to create a product that they shared for free, provided you visit their site and see their offerings. In this way they could afford to keep making this free product that everyone benefited from.
LLMs broke that social contract. Now that product will likely go away.
People can twist themselves into knots about how LLMs create “value” and that makes all of this ok, but the truth is they stole information to generate a new product that generates revenue for themselves at the cost of other people’s work. This is literally theft. This is what copyright law is meant to protect. If LLM manufacturers are making money off someone’s work, they need to compensate people for that work, same as any client or customer.
LLMs are not doing this for the good of society. They themselves are making money off this. And I’m sure if someone comes along with LLM 2.0 and rips them off, they’re going to be screaming to governments and attorneys for protection.
The ironic part of all of this is that LLMs are literally killing the businesses they need to survive. When people stop visiting (and paying) Tailwind, Wikipedia, news sites, weather, and so on, and only use LLMs, those sites and services will die. Heck, there’s even good reason to think LLMs will kill the Internet at large, at least as an information source. Why in the hell would I publish news or a book or events on the Internet if it’s just going to be stolen and illegally republished through an LLM without compensating me for my work? Once this information goes away or is locked behind nothing but paywalls, I hope everyone is ready for the end of the free ride.
senko
a month ago
I support your right to have an opinion, but in my opinion, thank God this is just your opinion.
Copyright, as practiced in late 20 and this century, is a tool for big corps to extract profits from actual artists, creators, and consumers of this art[0] equally. Starving artists do not actually benefit.
Look at Spotify (owned and squeezed by record labels) giving 70% of the revenue to the record labels, while artists get peanuts. Look at Disney deciding it doesn't need to pay royalties to book writers. Hell, look at Disney's hits from Snow White onwards, and then apply your "LLMs are IP theft" logic to that.
Here's what Cory Doctorow, a book author and critic of AI, has to say about it in [1]:
> So what is the alternative? A lot of artists and their allies think they have an answer: they say we should extend copyright to cover the activities associated with training a model.
> And I'm here to tell you they are wrong: wrong because this would inflict terrible collateral damage on socially beneficial activities, and it would represent a massive expansion of copyright over activities that are currently permitted – for good reason!.
---
> All written text, art work, etc needs to come imbued with a GPL style license
GPL-style license has been long known not to work well for artifacts other than code. That's the whole reason for existence of Creative Commons, GNU Free Documentation License, and others.
[0] "consumers of art" sounds abhorrent, yet that's exactly what we are [1] https://pluralistic.net/2025/12/05/pop-that-bubble/
testing22321
a month ago
So you don’t like big corps getting ever richer by hiding behind copyright.
How about my books?
I’ve published a few, make maybe $500 a month from them.
Is it fine for the LLMs to rip them off?
Altern4tiveAcc
a month ago
> Is it fine for the LLMs to rip them off?
Yes. It is good (and IMO should be encouraged) that derivative works can are made, even if it would make you less money.
testing22321
a month ago
Now you’ve taken away the incentive for me to write more, or for any new author to write a book. Goodbye books I guess.
Altern4tiveAcc
a month ago
I'm more than happy to read from authors writing freedom-respecting works instead, thanks.
Although getting rid of the copyright/IP nonsense for works that already exist is a big win on its own.
testing22321
a month ago
> I'm more than happy to read from authors writing freedom-respecting works instead, thanks.
You missed the point entirely.
Nobody will write books when big companies will just rip them off and take the profits.
Altern4tiveAcc
a month ago
> You missed the point entirely.
I didn't miss it, I just don't buy it. Lots of people create and write in their free time, without any financial incentives whatsoever.
I'm okay with writing books for profit, though. But if your business case depends on artificially limiting what users can do with their copies, then your business is fundamentally unethical, whether it's a one-man company or OpenAI. It's not a bad thing for such a business to go under.
senko
a month ago
1. Will people stop buying your books if LLMs have the information from them?
2. Will people stop buying your books if they can get them from the library? Is a library ripping you off?
3. Assuming your books are non-fiction (otherwise the answer to (1) would be a clear "no"), am I ripping you off if I read your books, create a course that teaches the same things (that you tought me through your book) and earn mega-money because I'm mega-skilled at marketing?
4. How about if I lend my copy to dozens of my friends, who are all very interested in the stuff you write but don't want to pay themselves?
5. Did OpenAI go to the bookstore, buy your book and scan it? Or did Amazon or any other ebook retailer just gave them the book PDF when they asked nicely? How did the rip off happen?
6. If an Anthropic employee buys your book in a bookstore, scans it and destroys the physical copy, and the digital equivalent is only used to train Claude, is that a ripoff?
This stuff is complex and as a society we're just starting to grapple with the consequences. Cory's making the case against copyright being used as a tool much more eloquently than I am - I encourage you to read it if you haven't already.
BTW in your particular case, I'd say you're pretty safe. Nobody stops buying books because they can get the same info from LLMs. If that's your concern, you might as well be mad at the Internet at large.
GlacierFox
a month ago
So this logic is essentially: Look at all these ways you're already getting ripped off. What's one more?! You should be grateful they're siphoning off all your work!
You've got a convert here. I don't think I'll publish my next book. I might just email it straight to Open AI.
And Cory Doctorow - I've attempted a few of his books. Felt like I was reading young adult fiction. He's pretty much the '2 prescient statements and a few average books' guy.
iterateoften
a month ago
I stopped writing open source projects on github because why put a bunch of work into something for others to train off of without any regard for the original projects
__MatrixMan__
a month ago
I don't understand this mindset. I solve problems on stackoverflow and github because I want those problems to stay solved. If the fixes are more convenient for people to access as weights in an LLM... who cares?
I'd be all for forcing these companies to open source their models. I'm game to hear other proposals. But "just stop contributing to the commons" strikes me as a very negative result here.
We desperately need better legal abstractions for data-about-me and data-I-created so that we can stop using my-data as a one-size-fits-all square peg. Property is just out of place here.
tombert
a month ago
I have mixed opinions on the "AI=theft" argument people make, and I generally lean towards "it's not theft", but I do see the argument.
If I put something on Github with a GPL 3 license, it's supposed to require anyone with access to the binary to also have access to the source code. The concern is, if you think that it is theft, then someone can train an LLM on your GPL code, and then a for-profit corporation can use the code (or any clever algorithms you've come up with) and effectively "launder" your use of GPL code and make money in the process. It basically would be converting your code from Copyleft to Public Domain, which I think a lot of people would have an issue with.
dangus
a month ago
The thing is, LLMs aren’t redistributing your code. You’d have a minuscule chance of an LLM actually reproducing your code verbatim without major modifications.
Copyright and copyleft only deal with source code distribution. Your last sentence is not really true from a factual perspective.
I think if you really believe in the open source free software mentality that code should be available to help everyone and improvements to it should also be available and not locked up behind a corporate wall (e.g., a company using GPL code and releasing it with modifications without redistributing the source code), LLMs should be the least of your worries since they don’t do that action. On a literal level they don’t violate GPLv2/v3.
Perhaps copyright law needs new concepts to respond to this change in capability compared to the past, but so far there has been very little legal success with companies and individuals trying to litigate AI companies for copyright violations. Direct violations have been rare and only get more rare over time as training methods evolve.
tombert
a month ago
Again, I tend fall more on the “it’s not theft” side of the debate.
That said, haven’t part of the complaints about Copilot and the like been specifically because they are reproducing large chunks of code verbatim?
catlifeonmars
a month ago
> You’d have a minuscule chance of an LLM actually reproducing your code verbatim without major modifications.
Wait, are you kidding? This is literally a problem we have today with tools like Copilot.
baranul
24 days ago
The thing is, what you are describing is arguably theft. It is purposefully circumventing the license and restrictions chosen by the author, in order to steal their code, so that it can be sold and used for profit. This is along the same lines of why book authors and artists have been suing AI companies.
Another point, there is a lot of free and permissive license content to train AI on, where the GPL or copyright can be respected. In many cases, the violating AI companies knew what they were doing was wrong.
techpression
a month ago
I find it very easy to understand, people don’t generally want to work for free to support billionaires, and they have few venues to act on that, this is one of them.
There are no ”commons” in this scenario, there are a few frontier labs owning everything (taking it without attribution) and they have the capability to take it away, or increase prices to a point where it becomes a tool for the rich.
Nobody is doing this for the good of anything, it’s a money grab.
__MatrixMan__
a month ago
Were these contributions not a radical act against zero-sum games in the first place? And now you're gonna let the zero-sum people win by restricting your own outputs to similarly zero-sum endeavors?
I don't wanna look a gift horse in the mouth here. I'm happy to have benefited from whatever contributions were originally forthcoming and I wouldn't begrudge anybody for no longer going above and beyond and instead reverting to normal behavior.
I just don't get it, it's like you're opposed to people building walls, but you see a particularly large wall which makes you mad, so your response is to go build a wall yourself.
imiric
a month ago
It's not about building a wall. It's about ensuring that the terms of the license chosen by the author are respected.
This is why I think permissive licenses are a mistake for most projects. Unlike copyleft licenses, they allow users to take away the freedoms they enjoy from users of derivative works. It's no surprise that dishonest actors take advantage of this for their own gain. This is the paradox of tolerance.
"AI" companies take this a step further, and completely disregard the original license. Whereas copyleft would somewhat be a deterrent for potential abusers, it's not for this new wave of companies. They can hide behind the already loosely defined legal frameworks, and claim that the data is derivative enough, or impossible to trace back, or what have you. It's dishonest at best, and corrupts the last remnants of public good will we still enjoy on the internet.
We need new legal frameworks for this technology, but since that is a glacial process, companies can get rich in the meantime. Especially shovel salespeople.
ajjahs
a month ago
[dead]
AshamedCaptain
a month ago
https://www.softwareheritage.org/ will index it anyway.
Also, if you publish your code in your own server, it will be DDoSed to death by the many robots that will try to scrape it simultaneously.
williamcotton
a month ago
I'm writing a few DSLs a year at this point and I would very much like them to be part of the training data for LLMs!
journal
a month ago
that's why i don't add comments to my commits, i don't want them to know the reason for the changes.
wnjenrbr
a month ago
Good, we don’t want code that people are possessive of, in the software commons. The attitude that you are concerned about what people do with your output means that nobody should touch your output, too big a risk of drama.
We don’t own anything we release to the world.
__MatrixMan__
a month ago
"Good riddance" is a pretty lousy position to take re: volunteer work. It should be: "how can we fix this?"
cogman10
a month ago
> if you train your model on this, your weights and training code must be published.
The problem here is enforcement.
It's well known that AI companies simply pirated content in order to train their models. No amount of license really helps in that scenario.
delfinom
a month ago
The problem here is "money".
The AI goldrush has proven that intellectual property laws are null and void. Money is all that matters.
ronsor
a month ago
> The AI goldrush has proven that intellectual property laws are null and void. Money is all that matters.
Indeed they never really mattered. They were a tool for large corporations to make money and they will go away if they can no longer serve such purpose. Anyone that thought there was a real moral or ethical basis to "intellectual property" laws fell for propaganda and got scammed as a result.
themanmaran
a month ago
The problem here is the "so what?"
Imagine OpenAI is required by law to list their weights on huggingface. The occasional nerd with enough GPUs can now self host.
How does this solve any tangible problems with LLMs regurgitating someone else's work?
hackyhacky
a month ago
> How does this solve any tangible problems with LLMs regurgitating someone else's work?
I'm not the OP, but here's my from-the-hip answer: if weights are public, building and operating an LLM is no longer a business plan in and of itself, as anyone could operate the same LLM. Therefore companies like OpenAI will be disincentivized from simply redirecting web traffic to their own site.
cogman10
a month ago
I didn't really put out the GPL push. The best I could say is that at least that information would be available to everyone rather than being tightly controlled by the company that stole the source material to create it in the first place. It might also dissuade LLM creators from mass piracy as a competitor could take their models and start hosting them.
kubanczyk
a month ago
> imbued with a GPL style license
GPL died. Licenses died.
Exnation: LLMs were trained also on GPL code. The fact that all the previously-paranoid businesses that used to warn SWEs not to touch GPL code with a ten foot pole are now fearlessly embracing LLMs' outputs, means that de facto they consider an LLM their license-washing machine. Courts are going to rubber stamp it because billions of dollars, etc.
hk__2
a month ago
Do I have to publish my book for free because I got inspiration from 100's of other books I read during my life?
antihipocrat
a month ago
Humans are punished for plagiarism all the time. Myriad examples exist of students being disenrolled from college, professionals being fired, and personal reputations tarnished forever.
When a LLM is trained on copyright works and regurgitates these works verbatim without consent or compensation, and then sells the result for profit, there is currently no negative impact for the company selling the LLM service.
blibble
a month ago
false equivalence because machines are not human beings
a lossy compression algorithm is not "inspired" when it is fed copyrighted input
eddd-ddde
a month ago
> lossy compression algorithm is not "inspired" when it is fed copyrighted input
That's exactly what happens when you read. Copyrighted input fed straight into your brain, a lossy storage and processing machine.
LPisGood
a month ago
I think it’s a pretty easy principle that machines are not people and people learning should be treated differently than machines learning
Terr_
a month ago
You see this principle in privacy laws too.
I can be in a room looking at something with my eyeballs and listening with my ears perfectly legally... But it would not be legal if I replaced myself with a humanoid mannequin with a video camera for a head.
zephen
a month ago
You can even write down what you are looking at and listening to, although in some cases, dissemination of, e.g. verbatim copies in your writing could be considered copying.
But it is automatically copying if you use a copier.
user
a month ago
user
a month ago
user432678
a month ago
Following your analogy, parrots should be considered human.
Ekaros
a month ago
Issue to me is that I or someone else bought those books. Or in case of local libraries the authors got money for my borrowing copy.
And I can not copy paste myself to discuss with thousands or millions of users at time.
To me clear solution is to make some large payment to each author of material used in traing per training of model say 10k to 100k range.
troupo
a month ago
If your book reproduces something 95% verbatim, you won't even be able to publish it.
hk__2
a month ago
Exactly. We assess plagiarism by checking the output (the book), not the input (how many book I’ve read before). It’s not an issue to train LLM on copyrighted resources if their output is randomized enough.
layer8
a month ago
If you are plagiarizing, “for free” doesn’t even save you.
ralph84
a month ago
We already have more IP than any human could ever consume. Why do we need to incentivize anything? Those who are motivated by the creation itself will continue to create. Those who are motivated by the possibility of extracting rent may create less. Not sure that's a bad thing for humanity as a whole.
johnpaulkiser
a month ago
> if you train your model on this, your weights and training code must be published.
This feels like the simplest & best single regulation that can be applied in this industry.
kimixa
a month ago
I feel to be consistent the output of that model will also be under that same open license.
I can see this being extremely limiting in training data, as only "compatible" licensed data would be possible to package together to train each model.
danaris
a month ago
Well, yes.
That's part of the point.
only-one1701
a month ago
B-b-but what if someone uses the weights and training code to train their own models!!
Dilettante_
a month ago
It'd substantially reduce the incentive for the training of new models.
blibble
a month ago
burglary as a business has an extremely high margin
for the burglar
mrgoldenbrown
a month ago
That's a good thing if it means it would reduce the incentive for mega corps to steal other people's work.
Forgeties79
a month ago
So what? Figure it out. They have billions in investor funding and we’re supposed to just let them keep behaving this way at our expense?
Facebook was busted torrenting all sorts of things in violation of laws/regulations that would lead to my internet being cut off by my ISP. They did it at scale and faced no consequences. Scraping sites, taking down public libraries, torrenting, they just do whatever they want with impunity. You should be angry!
drivebyhooting
a month ago
Meanwhile look at what happened to Aaron Schwartz. There’s no justice until corporations are held accountable.
Forgeties79
a month ago
I almost referenced him as well
baxtr
a month ago
This is one way to look at it.
The other way is to argue that LLMs democratize access to knowledge. Anyone has access to all ever written by humanity.
Crazy impressive if you ask me.
antihipocrat
a month ago
If the entities democratizing access weren't companies worth hundreds of billions of dollars with a requirement to prioritize substantial returns for their investors, I'd agree with you!
Difwif
a month ago
This is temporary. AI models have their own Moore's law. Yes the mega corps will have the best models but soon enough what is currently SOTA will be open source and run on your own local machine if you want.
the mega corps are getting all of us and the investors to fund the RnD.
aszen
a month ago
How? You don't know what the llm was trained on and don't know if it has any bias. Imo llms are a disaster for knowledge work because they act like a black box.
zephen
a month ago
Yes, it seems that way now.
The first one's free.
After you're hooked, and don't know how to think any more for yourself, and all the primary sources have folded, the deal will be altered.
catlifeonmars
a month ago
The internet already democratized access to knowledge. (Hosted) LLMs put that free knowledge behind a paywall. Taken by itself this seems fine —- how you access the knowledge (via internet or chat bot) is still up to you. However, the argument is that the knowledge producers aren’t incentivized to publish in a model where everything is fetched through agents. Couple that with closed weight models and you will (eventually) have overall worse access to less knowledge and higher personal cost.
dangus
a month ago
I both agree and disagree with you.
The thing is, copyright law is not really on your side. Viewing copyrighted material without paying for it is not generally something people get fined for. A lot of training falls under fair use that overrides whatever license you come up with. Disney can’t stop me from uploading clips of their movies alongside commentary and review because fair use allows that. LLMs generally aren’t redistributing code, which is the thing that copyright protects.
If I inspect some GPL code and get inspired by it and write something similar, the GPL license doesn’t apply to me.
It has always been the case that if you don’t want other people to apply fair use to your works, your only recourse is to keep those works private. I suspect that now individuals and companies that don’t want their code to be trained on will simply keep the code private.
Now, there have been times where LLMs have reproduced verbatim copyright material. The NYTimes sued OpenAI over this issue. I believe they’ve settled and come up with a licensing scheme unless I’m mixing up my news stories.
Second thing, your issue becomes moot if there exists a model that only trains off of MIT-licensed code, and there is a TON of that code out there.
Third thing, your issue becomes moot if users have agreed to submit their code for training, like what the GitHub ToS does for users who don’t change their settings, or if giant companies with giant code bases just use their own code to train LLMs.
Where I agree with you is that perhaps copyright law should evolve. Still, I think there’s a practical “cat is out of the bag” issue.
qsera
a month ago
>This substantially reduces the incentive for the creation of new IP.
Not all, but some kind of IP.
Some of those that is created for sake of creating it and nothing else.
kimixa
a month ago
The psychology behind "creating it for the sake of creating it" can also be significantly changed by seeing someone then take it and monetize it without so much as a "thank you".
It's come up quite often even before AI when people released things under significantly looser licenses than they really intended and imagined them being used.
dangoodmanUT
a month ago
But it’s not theft, because you’re not redistributing. It’s allowed, just like humans are allowed to learn from copyrighted content.
TrackerFF
a month ago
Sure. But if I see listen to some song, and copy it, and release it, I could get sued. Even if I claim that I'm merely inspired by the original content - the court doesn't care.
You don't need to redistribute the original material, it's enough that you just copied it.
Altern4tiveAcc
a month ago
> But if I see listen to some song, and copy it, and release it, I could get sued.
This is what should be fixed in the first place, then. You shouldn't get sued from what you do with your copy of a song.
amelius
a month ago
> just like humans are allowed to learn from copyrighted content
humans learning : machines learning == whale swimming : submarine swimming
It's not the exact 100% same thing. Therefore you cannot base any rights on it.
If you still don't buy it, consider this analogy:
killing a human vs. destroying a machine
Thank god that we're not using your line of thinking here.
Agentus
a month ago
[dead]
mrcwinn
a month ago
Commercialization may be a net good for open source, in that it helps sustain the project’s investment, but I don’t think that means that you’re somehow entitled to a commercial business just because you contributed something to the community.
The moment Tailwind becomes a for-profit, commercial business, they have to duke it out just like anyone else. If the thing you sell is not defensible, it means you have a brittle business model. If I’m allowed to take Tailwind, the open source project, and build something commercial around it, I don’t see why OpenAI or Anthropic cannot.
hoppp
a month ago
The difference is that they are reselling it directly. They charge for inference that outputs tailwind.
It's fine to have a project that generates html-css as long as the users can find the docs for the dependencies, but when you take away the docs and stop giving real credit to the creators it starts feeling more like plagiarism and that is what's costing tailwind here.
what
a month ago
Wouldn’t that mean any freelancer that uses tail wind is reselling it?
hoppp
22 days ago
The freelancer does navigate to the documentation page and might propose the client to buy the templates to speed up work.
Thats a big difference. Freelancers using it can bring compounding value as they can sell templates to every client
AI using it bring no value as the AI won't recommend buying the templates
Animats
a month ago
Education can be viewed as intellectual property theft. There have been periods in history when it was. "How to take an elevation from a plan" was a trade secret of medieval builders and only revealed to guild members. How a power loom works was export-controlled information in the 1800s, and people who knew how a loom works were not allowed to emigrate from England.
The problem is that LLMs are better than people at this stuff. They can read a huge quantity of publicly available information and organize it into a form where the LLM can do things with it. That's what education does, more slowly and at greater expense.
paradite
a month ago
By your analogy human brains as also IP thefts, because they ingest what's available in the world, mix and match them, and synthesize slightly different IPs based on them.
GrowingSideways
a month ago
Intellectual property was kind of a gimmick to begin with, though. Let's not pretend like copyright and patents made any sense to begin with
martin_drapeau
a month ago
They exist to protect the creator/inventor and allows them to get an ROI on their invested time/effort. But honestly today, the abundance of content, especially that can be generated by LLM, completely breaks this. We're overwhelmed with choice. Content has been comodotized. People will need to come to grasp with that and find other ways to get an ROI.
The article does provide a hint: "Operate". One needs to get paid for what LLMs cannot do. A good example is Laravel. They built services like Forge, Cloud, Nightwatch around open source.
GrowingSideways
a month ago
> They exist to protect the creator/inventor and allows them to get an ROI on their invested time/effort.
Yes, this betrayed the entire concept of the us as a pro-human market.
michaf
a month ago
Is there such a license? Or any license with special clauses for LLMs? Is it enforcable? Could someone 'poison' an LLM training run with injecting just one such licensed document? I am genuinely curious about what levers exist (or are conceivable) to protect your own IP from becoming LLM training data, if regular copyright does not qualify.
jefftk
a month ago
This isn't the kind of thing you can do with a license, as long as training a model doesn't require a license. Now, that's an open question legally in the US, and there are active lawsuits, but that does seem like the way it's most likely to play out.
journal
a month ago
If I was able to memorize every pixel value to reconstruct a movie from memory, would that be theft?
simion314
a month ago
>If I was able to memorize every pixel value to reconstruct a movie from memory, would that be theft?
Don an experiment, memorize a popular small poem, then publish it under your name (though I suggest to check the laws in your rtegion for this and also consider it might affect your reputation).
IMO is the same if ChatGPT memorizes my poem and then you ask it for a poem , you copy paste my poem from ChatGPT and publish it as your own.
AlienRobot
a month ago
It's already imbued with copyright infringement if you copy it without a license.
bwfan123
a month ago
> This substantially reduces the incentive for the creation of new IP
And as a result of this, the models will start consuming their own output for training. This will create new incentives to promote human generated code.
venndeezl
a month ago
In my opinion information wants to be free. It's wild to me seeing the tech world veer into hyper-capitalism and IP protectionism. Complete 180 from the 00s.
IMO copyright laws should be rewritten to bring copyright inline with the rest of the economy.
Plumbers are not claiming use fees from the pipes they installed a decade ago. Doctor isn't getting paid by a 70 year old for saving the 70 year old when they were in a car accident at age 50.
Why should intellectual property authors be given extreme ownership over behavior then?
In the Constitution Congress is allowed to protect with copyright "for a limited time".
The status quo of life of author + 99 years means works can be copyrighted for many peoples entire lives. In effect unlimited protection.
Why is society on the hook to preserve a political norm that materially benefits so few?
Because the screen tells us the end is nigh! and giant foot will crush us! if we move on from old America. Sad and pathetic acquiescence to propaganda.
My fellow Americans; must we be such unserious people all the time?
This hypernormalized finance engineered, "I am my job! We make line go up here!" culture is a joke.
drivebyhooting
a month ago
Excuse me, but even if in principle of “information wants to be free”, the actual outcome of LLMs is the opposite of democratizing information and access. It completely centralizes accesses, censorship, and profits in the hands of a few mega corporations.
It is completely against the spirit of information wants to be free. Using that catch phrase in protection of mega corps is a travesty.
venndeezl
a month ago
LLMs are just a concept, an abstraction. A data type for storing data.
The actual problem is political. Has nothing to do with LLMs.
Larrikin
a month ago
Those are meaningless words when you know the discussion is about LLMs taking in people's intellectual property and selling it back.
venndeezl
a month ago
Nah that's still a political resource allocation problem
Don't let politics allocate resources to massive data center projects
zephen
a month ago
> LLMs are just a concept, an abstraction. A data type for storing data.
C'mon. You know good and well that what is being discussed is the _use_ of LLMs, with the concomitant heavy usage of CPU, storage, and bandwidth that the average user has no hope of matching.
Altern4tiveAcc
a month ago
> You know good and well that what is being discussed is the _use_ of LLMs
Not the person you're replying to, but I've found that some people do argue against LLMs themselves (as in, the tech, not just the usage). Specially in humanities/arts cycles which seem to have a stronger feeling of panic towards LLMs.
Clarifying which one you're talking about can save a lot of typing/talking some times.
zephen
25 days ago
> I've found that some people do argue against LLMs themselves (as in, the tech, not just the usage). Specially in humanities/arts cycles which seem to have a stronger feeling of panic towards LLMs.
Maybe?
The person I responded to said "LLMs are just a concept, an abstraction."
Were that true, were they simply words in some dusty CS textbook, it's hardly likely that the humanities/arts people you describe would even know about them.
No, it's the fact that these people have seen regurgitated pictures and words that makes it an issue.
venndeezl
a month ago
Nah it's the use of massive amounts of resources to run data centers.
A political problem.
zephen
25 days ago
If LLMs weren't sucking up all the resources, bitcoin would be.
Which, yes, is partly a resource/political problem, but there are additional arguments against the use of those resources for LLMs.
Altern4tiveAcc
a month ago
> It completely centralizes accesses, censorship, and profits in the hands of a few mega corporations.
Have the biggest models be legally forced to be released in the open for end users, then. Best of both worlds.
Wait a few years, and you'll even be able to run those models in commodity hardware.
Enshittification in order to give returns to shareholders suck. The tech is great and empowering for the commons.
jen729w
a month ago
> In my opinion information wants to be free.
But I still need to pay rent.
venndeezl
a month ago
Well like a plumber then you should string together one paid job after another. Not do a job once and collect forever.
Rent is a political problem.
Perhaps invest in the courage to confront some droopy faced Boomers in Congress.
SoKamil
a month ago
The thing is, someone will collect rent from IP anyways. LLMs shift rent collecting from decentralized individuals to a handful of big tech companies.
Altern4tiveAcc
a month ago
> someone will collect rent from IP anyways
We should work on fixing that, then.
I agree with your point about big tech companies salivating at opportunities to collect rent. IP is part of the problem.
venndeezl
a month ago
Yeah they will. Because Muricans are too busy belaboring the obvious on social media rather than tackling the obvious political problems.
zephen
a month ago
> In my opinion information wants to be free.
Information has zero desires.
> It's wild to me seeing the tech world veer into hyper-capitalism and IP protectionism.
Really? Where have you been the last 50 years?
> Plumbers are not claiming use fees from the pipes they installed a decade ago.
Plumbers also don't discount the job on the hopes they can sell more, or just go around installing random pipes in random locations hoping they can convince someone to pay them.
> Why should intellectual property authors be given extreme ownership over behavior then?
The position that cultural artifacts should enter into the commons sooner rather than later is not unreasonable by any means, but most software is not cultural, requires heavy maintenance for the duration of its life, and still is well past obsolescence, gone and forgotten, well before the time frame you are discussing.
timcobb
a month ago
> This substantially reduces the incentive for the creation of new IP.
You say that like it's a bad thing...
tazjin
a month ago
Does anyone know of active work happening on such a license?
jsheard
a month ago
Writing the license is the easy part, the challenge is in making it legally actionable. If AI companies are allowed to get away with "nuh uh we ran it through the copyright-b-gone machine so your license doesn't count" then licenses alone are futile, it'll take lobbying to actually achieve anything.
tazjin
a month ago
Huh? Clearly writing it is not easy, as per your own comment
jsheard
a month ago
My point is that you could write the most theoretically bulletproof license in the world and it would count for nothing under the precedent that AI training is fair use, and can legally ignore your license terms. That's just not a problem that can be solved with better licenses.
hoppp
a month ago
I got an
"LLM Inference Compensation MIT License (LLM-ICMIT)"
A license that is MIT compatible but requires LLM providers to pay after inference, but only restricts online providers, not self-hosted models
HumanOstrich
a month ago
That's not MIT-compatible, it's the opposite. MIT-compatible would mean that code under your license could be relicensed to MIT. Similar to how the GPL is not MIT-compatible because you cannot relicense GPL code under MIT.
hoppp
22 days ago
Makes sense. I got the license from an LLM so it probably doesn't get it right
glerk
a month ago
I can ask Claude to generate you one right now. It will be just a bunch of bullshit words no matter how much work you put into writing them down (like any other such license).
blitz_skull
a month ago
The idea of being able to “steal” ideas is absolutely silly.
Yeah we’ve got a legal system for it, but it always has been and always will be silly.
wnjenrbr
a month ago
In my opinion, IP is dead. Strong IP died in 2022, along with the Marxist labor theory of value; of which IP derives its (hypothetical) value. It no longer matters who did what when and how. The only thing that matters is that exists, and it can be consumed, for no cost, forever.
IP is the final delusion of 19th century thinking. It was crushed when we could synthesize anything, at little cost, little effort. Turns out, the hard work had to be done once, and we could automate to infinity forever.
Hold on to 19th century delusions if you wish, the future is accelerating, and you are going to be left behind.
mrtesthah
a month ago
This is a tone deaf take that ignores the massive imbalance in how IP law is wielded by large corporations vs individuals or small firms.
wnjenrbr
a month ago
No, it’s the most empowering thing humanity has ever constructed to wrestle the beast of IP and make it an irrelevant footnote in the human story. Deckchairs on the titanic.
If one wastes their life in court, arguing 19th century myths, that’s on the players.
mrtesthah
a month ago
IP law is not going away for “little people” like us until we collectively overturn the existing political regime which grants more rights to corporations than people.
Altern4tiveAcc
a month ago
IP being used by small firms instead of large corporations does not make it a good thing. It's the same disgusting concept to deny freedom for end users to give control to who "owns" the IP.
IP as a concept needs to die.
mrtesthah
25 days ago
>IP as a concept needs to die.
Yes it does, but if you think that's going to happen on its own by allowing the largest corporations to run roughshod over everyone else, you're going to be disappointed. True freedom takes struggle.
CrimsonRain
a month ago
[flagged]
tomhow
25 days ago
> Your opinion is shit.
WTF? This is a completely unacceptable comment on HN. I don't know why you would think that is acceptable after being registered here for so long. The entire reason HN exists is to be better than that.
user
a month ago