TFNA
9 hours ago
I’m a researcher who for years has been scanning my library’s holdings on my particular discipline for my own use, but also uploading the books to the shadow libraries for everyone else’s benefit. The revelation that LLMs are training on the shadow libraries has made me put a lot more effort into ensuring my scans are well-OCRed. The idea that I could eventually ask ChatGPT or whatever about obscure things in my field, and get useful output (of the "trust but verify" sort), is exciting.
lelanthran
2 hours ago
> The idea that I could eventually ask ChatGPT or whatever about obscure things in my field, and get useful output (of the "trust but verify" sort), is exciting.
That's your idea, not the one they are going with.
Their idea is that you pay a fee to access any information that was freely available.
Your idea is tearing down of fences, their idea is gatekeeping. The two ideas are incompatible.
baq
2 hours ago
Their idea is being able to get answers to questions which were difficult to answer before[0]. Of course they want to get paid for it. The information wasn’t available easily and not always[1] freely.
[0] among other things…
[1] more like ‘often not at all’
entrox
an hour ago
> Of course they want to get paid for it.
So should the original authors, no? That is, getting a share of that payment.
Something akin to the German GEMA could work, an entity that levies a usage fee on behalf of all copyright holders and re-distributes to its members, but on a global scale.
inetknght
an hour ago
> So should the original authors, no? That is, getting a share of that payment.
Should they? Yes. Will they?
Well, do LLM model builders pay for any copyrighted work so far?
tokai
33 minutes ago
Hasn't that been scanned by Google already? Their model should be trained on most of those texts already.
BrenBarn
8 hours ago
How about the idea that you might have to eventually pay an AI company a large amount of money to ask ChatGPT such a question, while the library itself has lost funding?
BugsJustFindMe
8 hours ago
Library funding is a political stance that has only imaginary connection to whether people pay to ask things of ChatGPT. People can pay to talk to an AI and also government can fund libraries.
soco
an hour ago
The government can then soon "optimize" and fund exactly one library.
bakugo
5 hours ago
Do you believe it makes sense for the government to fund libraries that almost nobody uses because they'd rather ask ChatGPT?
stingraycharles
3 hours ago
If people prefer to pay ChatGPT, rather than going to the library for free, and ChatGPT sources content from libraries, then sure that makes sense, especially if the information contained is of cultural relevance to the government.
It’s the same as asking “should you release open source software knowing that AI companies are training on them”. I could absolutely not care less, that’s not the point why I release my software to the public at all.
indigo945
5 hours ago
People are already not using libraries because they'd rather rot their brains on TikTok than read a book. (Also, for information lookup, the internet and search engines exist, and have for a while now.) This has no actual causal relation.
snaking0776
2 hours ago
People is a broad term. Outside of major cities (where I live) libraries serve a very essential service for parents and their children and as a free communal space for the broader community. Our libraries are always full and a large part of the health of our area.
breezybottom
an hour ago
Weird that my local library is always full.
Tangurena2
an hour ago
Libraries in my state also lend out tools.
A recent executive order prohibits libraries (among other non-profits) from processing US passport applications. While county clerks (in my state) along with a small number of post office locations also offer this service, the libraries were doing it for free as opposed to charging $50-ish (like the post office or county clerks).
Why might the passport issue be important? The SAVE Act (passed the House of Representatives last year and sitting before the Senate) only permits 4 identification items to register to vote for Federal elections:
1 - A US Passport (costs about $100 to renew, about $150 for first time).
2 - A US Military ID that has proof of US citizenship (CAC cards show this with a white background behind your name - yellow or blue for contractors or non-US citizens). IDs for retirees don't show citizenship.
3 - A REAL ID compliant driving license that has proof of US citizenship. Also called "Enhanced Driving License", on the front it has a US flag and the back looks like the page on your passport with those funny letters. Only 5 states offer this as an extra $30-40 on top of the regular driving license fee.
4 - A REAL ID compliant driving license/ID and certified birth certificate and the names must match exactly. This means that 74 million women who took their husbands' name will not be voting in Federal Elections. Also, no transgender people can vote.
The SAVE Act also requires voter registration agencies to send voter rolls to DHS every month. And every month DHS can throw people off the voter rolls with no warning, no notice nor recourse. One can easily imagine this being done right before elections where people who registered for the "wrong" political party will be thrown off the rolls after the deadline to register.
Project 2025 wants to repeal the 19th Amendment. Throwing 74 million women off the voter rolls is just a start.
Links:
SAVE Act text - https://www.congress.gov/bill/119th-congress/house-bill/22/t...
https://www.congress.gov/bill/119th-congress/house-bill/22/t...
roenxi
3 hours ago
1. Being offered a service you would pay a lot of money for is a step forward. When people pay a large amount of money for something that means they wanted the thing more than the money. The link between ChatGPT and libraries being under threat seems a bit weak too.
2. The Chinese have been investing a lot into free models, they're perfectly good and keep improving; despite the best efforts of the US. They're even ramping into making their own hardware. Gemma 4 is pretty snappy too. It doesn't seem like there is much of a moat to this, my guess is there will be perfectly good local models if you want to avoid AI companies.
cheschire
3 hours ago
When people pay a large amount of money for something that means they wanted the thing more another thing. Money just provides the method to defer value transfer.
When the person paying the money is rich, the other thing they are foregoing is typically not a life necessity. When the person is poor, however, it typically is.
spoaceman7777
8 hours ago
Free, downloadable AI models have consistently caught up to ChatGPT within 3 months, for almost a year now.
I highly encourage you to go and update your priors.
roygbiv2
4 hours ago
And how much does the hardware cost to run said models?
Lerc
10 minutes ago
It can be quite expensive to get the models and machines to do this.
That's what the money pays for when the Comment above mentions 'that you might have to eventually pay an AI company a large amount of money to ask ChatGPT such a question'
Putting aside that it won't be a large amount of money For any particular query , that's how the AI companies see themselves, not as providers of information, but as providers of mechanisms that provide information. It is not selling the Information of others, it isn't selling information at all. They are selling the service of running the mechanism.
dboreham
4 hours ago
You can run them slowly on any machine that has enough memory.
JKCalhoun
29 minutes ago
And, to bolster your comment, you can still use this machine as your daily driver.
I'm always going to have a machine anyway—might as well max out the RAM when I purchase another.
(And so too I jumped on the Mac mini bandwagon a month or two back—64 GB. I'm enjoying pulling down the new models and putting them through my paces.)
fragmede
4 hours ago
How good do you want it to be? For a close to ChatGPT today (April, 2026), you're still looking at a system with 7xH200+chassis, which will run you $300, or a GB200 NV72, which is $2-3 million. OTOH, a Qwen3.6 quantized model can be run on $10,000 (high end Mac) or $1,000 (Mac mini) worth of hardware. Even a Pixel 10 Pro cellphone ($1,000) can run useful models locally.
dzink
2 hours ago
Go to Open Router, ask your own in investigative prompt that meets your needs to all the top open models. See how they do. Then notice if you can run any of those locally. Repeat at least once a month.
JKCalhoun
22 minutes ago
Thanks, BTW, now I have learned about OpenRouter.
It doesn't look like they have a way to filter down to "open" models. By this of course I mean "downloadable, local models".
I suppose if you know the "family" (Gemma, Qwen, etc.), I can just go to those models and test…
I've simply been pulling down what is popular from the LM Studio front end (and what runs on my hardware) and testing in situ.
woctordho
6 hours ago
A digital library needs almost no funding. With today's decentralized networking infrastructure such as BitTorrent and IPFS I bet it just exists forever.
x-complexity
6 hours ago
> A digital library needs almost no funding.
Clarification:
To maintain the library still requires resources & effort to do so. It only appears to need no funding because the donators of said (disk space / bandwidth / dev effort) are subsidizing it in aid of a goal they believe in (i.e. the church model).
Tangurena2
an hour ago
The way public libraries currently "lend" digital books is that they can only lend titles a certain amount of time before the library has to repurchase the title (or remove it from circulation).
tardedmeme
6 hours ago
How much of Anna's Archive are you seeding?
woctordho
6 hours ago
About 4 TB at hand
TFNA
8 hours ago
Some people might have to pay a large amount of money to ask a commercial LLM, but advances in this space mean that if I have the data myself on my own computer, or can download it from a shadow library, I might eventually be able to ask everything locally for free.
> while the library itself has lost funding
Libraries are inherent parts of universities. While their precise role evolves, do you think that they will just be done away with? Already a substantial amount of scholarship in disciplines other than my own has moved online (legally), and the library is still there.
protocolture
7 hours ago
How about the idea that one day you might be paying a subscription to use a service while non sequitur.
locknitpicker
8 hours ago
> How about the idea that you might have to eventually pay an AI company a large amount of money to ask ChatGPT such a question, while the library itself has lost funding?
There are plenty of free models with RAG support. Why do you believe everything starts and ends with a major corporation charging a subscription?
altmanaltman
7 hours ago
How is any of that legal? Can you just take books from the library and then scan and upload digital copies? How do you deal with the ethics of this personally, stealing to make it easier for AI to steal so AI gets better? Does calling yourself a "researcher" make you feel like its actually something worthwhile you're doing?
x-complexity
6 hours ago
> How do you deal with the ethics of this personally, stealing to make it easier for AI to steal so AI gets better?
If the obscure book/text is permanently lost forever under your stringent advice of "no stealing under any circumstances", would the "stealing" have saved it? If so, is it ethical to prevent others from accessing the book/text, under your guise of "preventing stealing"?
GaryBluto
7 hours ago
> How do you deal with the ethics of this personally, stealing to make it easier for AI to steal so AI gets better?
By quoting your comment in my reply, have I "stolen" your comment?
fragmede
5 hours ago
By reading this comment you have entered into a legal contract, by which you owe me $5. Failure to pay will be reported to the Internet police.
granabluto
5 hours ago
First, it's called infringement, not stealing. It's a custom defined term in a custom defined law.
Second, it is totally legal to read the book in a public library, for free, right now.
Third, laws can change. Current copyright law was pushed by one company (Disney) to +90years, to their benefit, and can be redesigned/pushed back by AI companies, for their benefit.
A 2 year copyright duration sounds like a good compromise.
TFNA
7 hours ago
As a researcher, the main worthwhile thing that I am doing is publishing research, but having all this prior scholarship at hand 24/7 definitely makes it easier to produce said publications. And if I have created a scan, why not help out my colleagues, too?
"Deal with the ethics", seriously? You might want to learn about how heavily shadow libraries are used across academia now. It’s no longer just disadvantaged scholars in the developing world relying on pirated scans because they don’t have good libraries. It’s increasingly everyone everywhere, because today’s shadow libraries can be faster and more convenient than even one’s own institution’s holdings. At conferences, if the presenter mentions a particularly interesting publication, you can sometimes watch several people in the room immediately open LibGen or Anna’s Archive on their laptop to download it right there and then.
subscribed
4 hours ago
It's not stealing, it's uploading without the licence. Laws in many countries allow for the lawful download of such books, regardless of how they were uploaded.
Separately, aren't always sensible or right - slavery was legal, child marriage was legal, not paying taxes on billions of profits is legal while not paying taxes of £1000 is illegal, reporting Jews to Nazis was mandatory, etc, etc.
felooboolooomba
6 hours ago
> How is any of that legal?
He didn't mention legality. The world is rigged, as you can see by head of state participating in both in running and cover up of history's largest CSE. Watch what people are doing in addition to what they are saying.
I for one am tremendously thankful for TFNA's efforts, since I get access to knowledge that I wouldn't have been able to before.
tardedmeme
6 hours ago
AI training is legal because the supreme court said so.
woctordho
6 hours ago
Copyright is a property right, and property right is what we call a bourgeois legal right. It will cease to exist as productive force like AI develops.
breezybottom
an hour ago
Imagine thinking Sam Altman and Elon Musk are your comrades.
woctordho
14 minutes ago
Sure. There's a saying that Marxism is not the thought of Marx alone. Sam Altman is also just a representative of who contribute to and benefit from the AI community.
__alexs
6 hours ago
You can't steal information don't be silly. You can just not have permission to copy it. Oh no.
emsign
6 hours ago
That's a slave mentality. You are aware that OpenAI charges money for other people's work and intelligence, right? Your own and that of other volunteer pirates and of the original authors as well. I don't get people like you at all.
TFNA
6 hours ago
I’ve already posted in this thread about how even if OpenAI charges money for its LLM trained on the literature, that doesn’t change the fact that the literature remains available to everyone through the shadow libraries, and advances in AI mean that one can increasingly work with it locally on one’s own computer.
__alexs
6 hours ago
Open weight models exist and are critical to us avoiding a future where you have to pay sama a slice of every engineers salary.
wallst07
3 hours ago
>I don't get people like you at all.
Because you don't try, which says more about you than OP. It's a major problem with society.