hackernews client

Large language models reduce public knowledge sharing on online Q&A platforms

448 pointsposted 9 months ago

(academic.oup.com)

365 Comments

insane_dreamer

9 months ago

The problem is eventually what are LLMs going’s to draw from? They’re not creating new information, just regurgitating and combining existing info. That’s why they perform so poorly on code for which there aren’t many many publicly available samples, SO/reddit answers etc.

nfw2

9 months ago

Fwiw, GPT o1 helped me figure out how a fairly complex use case of epub.js, an open-source library with pretty opaque documentation and relatively few public samples. It took a few back-and-forths to get to a working solution, but it did get there.

It makes me wonder if the AI successfully found and digested obscure sources on the internet or was just better at making sense of the esoteric documentation than me. If the latter, perhaps the need for public samples will diminish.

TaylorAlexander

9 months ago

Well Gemini completely hallucinated command line switches on a recent question I asked it about the program “john the ripper”.

We absolutely need public sources of truth at the very least until we can build systems that actually reason based on a combination of first principles and experience, and even then we need sources of truth for experience.

You simply cannot create solutions to new problems if your data gets too old to encompass the new subject matter. We have so systems which can adequately determine fact from fiction, and new human experiences will always need to be documented for machines to understand them.

mptest

9 months ago

In my experience o1 is not comparable to any other llm experience. I have had multiple phd friends test it - it's what has turned them from stochastic parrot campers to possible believers

and to be clear - as a layman, (in almost every field) I've recognized that llm's weren't up to the challenge of disavowing that notion from my phd friends up until o1 and never even tried, even though I've 'believed' since gpt 2

Tostino

9 months ago

I haven't found really any use case that o1 was better than 4o or 3.5 Sonnet that related to actual work.

Any time I tried some of the more complex prompts I was working through something with Sonnet or 4o, o1 would totally miss important points and ignore a lot of the instructions while going really deep trying to figure out some relatively basic portions of the prompt.

Seems fine for reasonably simple prompts, but gets caught up when things get deeper.

mptest

9 months ago

Yeah, I generally agree with that. Why I said it only moved them from stochastic parrot campers to "possible" believers - to clarify, the few I've had test it have all pretty much said "this feels like it could lead to real reasoning/productivity/advances/intelligence".

kachapopopow

9 months ago

Experienced the same thing with a library that has no documentation and takes advantage of c++23(latest) features.

MichaelZuo

9 months ago

Same, I’m pretty convinced it does in fact do genuinely original reasoning in at least a few areas, after enough prompts and with enough prodding.

But it takes so long and so much prompting skill to get to that point that the use cases seem limited.

rmbyrro

9 months ago

I'm having a similar experience with o1. It's the only model that can identify causes of bugs for me. Perhaps it's already clever enough to be used in generating synthetic data about coding to train and improve less capable models. Even synthetic StackOverflow Q&A format.

inthebin

9 months ago

To be honest, many times GPT 4o helps me understand poorly written emails by colleagues. I often find myself asking it "Did he means this when he wrote this?"... I'm a bit on the spectrum so if someone asks me vague questions or hallucinates words for things that don't exist, I have to verify with chatGPT to reaffirm that they are in fact just stupid.

adam_hn

9 months ago

same here, sometimes my understanding of a message doesn't make sense, but when gpt4 gives me the same understanding, it feels like we can't both reach the same conclusion by mistake. of course, that's not true, but in most cases, it's good enough.

ilrwbwrkhv

9 months ago

Curious about your complex use case of epub.js. What were you trying to do with it?

nfw2

9 months ago

I'm building an e-reader app where "enhancement content" such as illustrations, context-approprate summaries, and group chat can be integrated into the reading experience.

The way I am connecting external content to the epub is through an anchoring system -- sequences of words can be hashed to form unique ids that are referenced by the enhancement. Doing this lets me index the enhancement content in such a way that is format-independent and doesn't require modifying the underlying epub.

The specific task o1 helped me with was determining what the text is visible at any given point in time. This text is then turned into hashes to pull the relevant enhancement content.

Getting the current words on the page doesn't seem all that complex, but the epub.js API is pretty confusing. There are abstractions like "Rendition", "Location", "Contents", "Range", and it's not always intuitive which of these will provide the appropriate methods. I'm sure I would have figured it out eventually by looking at the API docs, but GPT probably saved me and hour or two of trial-and-error.

ilrwbwrkhv

9 months ago

Interesting. How did you come across this idea. And how long have you been working on it?

nfw2

9 months ago

I'm not sure when I first had the idea. I read a lot of mystery, but I'm often frustrated trying to remember all the details. A virtual notebook seems like it could help a lot for me personally.

My mom and grandma also read books with a physical notebook handy, and seems like modern technology should make that unnecessary.

I was laid off as part of the massive layoffs in 2022 and made a first pass at the project. I didn't use epubjs and instead wasted a lot of time making a custom rendering engine -- this was dumb. Eventually I got another job and paused the project. I started again in earnest a month or so ago using epubjs as the base.

ilrwbwrkhv

9 months ago

Why group chat? Where did you think of that?

nfw2

9 months ago

From being in book clubs

CptFribble

9 months ago

> group chat

hmm having a sort of mini-forum-like experience tied to particular pages in a book seems like a fascinating idea! being able to discuss plot twists and such only once you've already gotten to that point?

wow this seems like an amazing idea actually! any names yet? I'd love to check it out once it's done!

tpoacher

9 months ago

Did you then share this insight back onto a public channel?

zmmmmm

9 months ago

It may be an interesting side effect that people stop so gratuitously inventing random new software languages and frameowrks because the LLMs don't know about it. I know I'm already leaning towards tech that the LLM can work well with, simply because being able to ask the LLM to solve 90% of the problem outweighs any marginal advantage using a slightly better language or framework offers. Fro example, I dislike Python as a language pretty intensely, but I can't deny that the LLMs are significantly better in Python than many other languages.

jsnelgro

9 months ago

It kinda makes me sad. I hope we don't enter an era of stagnation because LLMs only understand our current tech. There are so many brilliant ideas that mainstream languages haven't adopted (and may never be able to adopt). There are language features outside of python and JavaScript that will change the way you think about systems and problem solving. Effects systems, structured concurrency, advanced type systems, distributed computing... There are brilliant people discovering safer and smarter ways to solve these problems but it seems nobody will care anymore.

mgl

9 months ago

Yes, sticking to the most popular technologies increases quality of the output, enabling even smaller startups to build applications like https://youtu.be/oafdA2WXvEc?feature=shared

A4ET8a8uTh0

9 months ago

Alternatively, esoteric languages and frameworks will become even more lucrative ,simply because only the person who invented them and their hardcore following will understand half of it.

Obviously, not a given, but not unreasonable given what we have seen historically.

chii

9 months ago

> become even more lucrative

why would it be lucrative? The person paying would consider whether they'd get locked in to the framework/language, and be held hostage by the creator(s). This is friction to adoption. So LLMs will make popular, corporate backed languages/frameworks even more popular and drown out the small ones.

A4ET8a8uTh0

9 months ago

<< why would it be lucrative?

Scarcity of some knowledge. Not all knowledge exists on SO. You are right about the popular stuff, but the niche stuff will be like everything else niche, harder to get and thus more expensive. COBOL is typically used as an example of this, but COBOL was at least documented. I am saying this, because, while I completely buy that there will be executives who will attempt to do this, it won't be so easy to rewrite it all like Jassy from Amazon claims ( or more accurately, it will be easy, but with exciting new ways for ATMs, airlines and so on to mess with one's day).

<< The person paying would consider whether they'd get locked in

I want to believe a rational actor would do that. It makes sense. On the other hand, companies and people sign contracts that lock them in all the time to all sorts of things and for myriad of reasons including, but not limited to being wined and dined.

Again, I think you are right about the trend ( as it will exacerbate already existing issues ), but wrong about the end result.

withinboredom

9 months ago

I created a new framework and fed my documentation + certain important code snippets into it. It worked out fantastic. Now adays though, the LLM never follows links and will hallucinate the whole thing, in a completely wrong language.

vintermann

9 months ago

You can probably fine-tune a general purpose programming model on the code and documentation of your language project (the documentation being in large part written by an LLM too, of course).

spicyusername

9 months ago

I mean, this calculus was already there before LLMs when choosing a stack.

creer

9 months ago

> They’re not creating new information

Most of this "knowledge sharing on online Q&A platforms" is NOT creative activity. It's endless questions about the same issues everyone is having except the system developers themselves. Much of this is just displacing search platforms.

woodruffw

9 months ago

It can be simultaneously true that (1) most answers on online Q&A platforms are not novel, and (2) that LLMs reduce the proportional novelty of answers.

AtlasBarfed

9 months ago

For the purposes of the argument it is: these are the interface between the unseen "real world" and the LLMs. So information coming from these forums, even if regurgitated from "real life" or "education" or "experience"... the writer or someone else's, is a "creation" to the LLM.

stickfigure

9 months ago

> The problem is eventually what are LLMs going’s to draw from?

Published documentation.

I'm going to make up a number but I'll defend it: 90% of the information content of stackoverflow is regurgitated from some manual somewhere. The problem is that the specific information you're looking for in the relevant documentation is often hard to find, and even when found is often hard to read. LLMs are fantastic at reading and understanding documentation.

Const-me

9 months ago

That is only true for trivial questions.

I've answered dozens of questions on stackoverflow.com with tags like SIMD, SSE, AVX, NEON. Only a minority of these asked for a single SIMD instruction which does something specific. Usually people ask how to use the complete instruction set to accomplish something higher level.

Documentation alone doesn't answer questions like that, you need an expert who actually used that stuff.

creer

9 months ago

And even for trivial questions, there is a lot out there that the doc developers ignored or hid.

For me, the current state of affairs is the difficulty of aiming search engines to the right answer. The answer might be out there, but figuring out HOW t oask the questions, which are the right keywords, etc - basically requires knowing where the answer is. LLMs have potential in rephrasing boththe question and what they might have glanced at here and there - even if obvious in hindsight.

skydhash

9 months ago

Don’t use search engines unless you know what you’re searching. Which means I start with another material first (books, wiki, manual,…) which will give me enough ideas of what I want will be. Starting with a search engine is like searching for a piece of a jigzaw with only the picture on the box, you have to know it’s shape first and where it would go.

xxs

9 months ago

It's a very similar situation with concurrent programming. Knowing which instruction does a CAS, or the exact memory semantics of a particular load/store/etc., doesn't do much on its right own.

lossolo

9 months ago

Knowledge gained from experience that is not included in documentation is also significant part of SO. For example "This library will not work with service Y because of X, they do not support feature Y, as I discovered when I tried to use it myself" or other empirical evidence about the behavior of software that isn't documented.

irunmyownemail

9 months ago

Published documentation has been and can be wrong. In the late 1990's and early 2000's when I still did a mix of Microsoft technologies and Java, I found several bad non-obvious errors in MSDN documentation. AI today would likely regurgitate it in a soft but seemingly mild but arguably authoritative sounding way. At least when discussing with real people after the arrows fly and the dust settles, we can figure out the truth.

Ferret7446

9 months ago

Everything (and everyone for that matter) can be and has been wrong. What matter is if it is useful. And AI as it is now is pretty decent at finding ("regurgitating") information in large bodies of data much faster than humans and with enough accuracy to be "good enough" for most uses.

Nothing will ever replace your own critical thinking and judgment.

> At least when discussing with real people after the arrows fly and the dust settles, we can figure out the truth.

You can actually do that with AI now. I have been able to correct AI many times via a Socratic approach (where I didn't know the correct answer, but I knew the answer the AI gave me was wrong).

roughly

9 months ago

Yeah, this is wildly optimistic.

From personal experience, I'm skeptical of the quantity and especially quality of published documentation available, the completeness of that documentation, the degree to which it both recognizes and covers all the relevant edge cases, etc. Even Apple, which used to be quite good at that kind of thing, has increasingly effectively referred developers to their WWDC videos. I'm also skeptical of the ability of the LLMs to ingest and properly synthesize that documentation - I'm willing to bet the answers from SO and Reddit are doing more heavy lifting on shaping the LLM's "answers" than you're hoping here.

There is nothing in my couple decades of programming or experience with LLMs that suggests to me that published documentation is going to be sufficient to let an LLM produce sufficient quality output without human synthesis somehwere in the loop.

insane_dreamer

9 months ago

> regurgitated from some manual somewhere

yes, a lot of stuff is like that, and LLMs are a good replacement for searching the docs; but the most useful SO answers are regarding issues that are either not documented or poorly documented, and someone knows because they tried it and found what did or did not work

elicksaur

9 months ago

Following the article’s conclusion farther, humans would stop producing new documentation with new concepts.

neither_color

9 months ago

I find that it sloppily goes back and forth between old and new methods, and as your LLM spaghetti code grows it becomes incapable of precision adding functions without breaking existing logic. All those tech demos of it instantly creating a whole app with one or a few prompts are junk. If you don't know what you're doing then as you keep adding features it WILL constantly switch up the way you make api calls(here's a file with 3 native fetch functions, let's install and use axios for no reason), the way you handle state, change your css library, etc.

{/* rest of your functions here*} - DELETED

After a while it's only safe for doing tedious things like loops and switches.

So I guess our jobs are safe for a little while longer

emptiestplace

9 months ago

Naively asking it for code for anything remotely complex is foolish, but if you do know what you're doing and understand how to manage context, it's a ridiculously potent force multiplier. I rarely ask it for anything without specifying which libraries I want to use, and if I'm not sure which library I want, I'll ask it about options and review before proceeding.

grugagag

9 months ago

It’s just like any other tool. If you know how to leverage it you’ll gain a lot more from it.

n_ary

9 months ago

LLMs show their limits as you try to ask something new(introduced in last 6-12 months) being not used. I was asking Claude and GPT4o about a new feature of go, it just gave me some old stuff from go docs. Then I went to go docs(official) and found what I was looking for anyways, the feature was released 2 major versions back, but somehow neither GPT4o nor claude know about this.

SunlitCat

9 months ago

With GPT 4o I had some success pointing it to the current documentation of projects I needed and had it giving me current and actual answers.

Like "Help me to do this and that and use this list of internet resources to answer my questions"

juthen

9 months ago

Speech. The speech-to-text pipeline is inherent in us. The convertion model relies on our education and cultural factors. The models can transcribe speech and do this conversion for new data generation. Have 10 mics at a public square and you'll have an infinite dataset (not a very smart one, necessarily...).

empath75

9 months ago

AI companies are already paying humans to produce new data to train on and will continue to do that. There's also additional modalities -- they've already added text, video, and audio, and there's probably more possible. Right now almost all the content being fed into these AIs is stuff that humans can sense and understand, but why does it have to limit itself to that? There's probably all kinds of data types it could train on that could give it more knowledge about the world.

Even limiting yourself to code generation, there are going to be a lot of software developers employed to write or generate code examples and documentation just for AIs to ingest.

I think eventually AIs will begin coding in programming languages that are designed for AI to understand and work with and not for people to understand.

imoverclocked

9 months ago

> AI companies are already paying humans to produce new data to train on and will continue to do that.

The sheer difference in scale between the domain of “here are all the people in the world that have shared data publicly until now” and “here is the relatively tiny population of people being paid to add new information to an LLM” dooms the LLM to become outdated in an information hoarding society. So, the question in my mind is, “Why will people keep producing public information just for it to be devalued into LLMs?”

user

9 months ago

[deleted]

BlueTemplar

9 months ago

Hmm, has there even been much success in training neural networks on taste, touch or smell ? I kind of doubt we have good enough sensors for that ?

manmal

9 months ago

How would a custom language differ from what we have now?

If you mean obfuscation, then yeah, maybe that makes sense to fit more into the window. But it’s easy to unobfuscate, usually.

Otherwise, I‘m not sure what the goal of an LLM specific language could be. Because I don’t feel most languages have been made purely to accommodate humans anyway, but they balance a lot of factors, like being true to the metal (like C) or functional purity (Haskell) or fault tolerance (Erlang). I‘m not sure what „being for LLMs“ could look like.

cloverich

9 months ago

All of the conversations they are having with real humans? If it's as big as chat gpt why can't it just consume its own information which is at least half from a human?

Seems like the humans using it can be also training and effectively paying for the right to do so? I would guess this could be easily gamified.

mycall

9 months ago

I thought synthetic data is what is partially training the new multimodal large models, i.e. AlphaGeometry, o1, etc.

y7

9 months ago

Synthetic data can never contain more information than the statistical model from which it is derived: it is simply the evaluation of a non-deterministic function on the model parameters. And the model parameters are simply a function of the training data.

I don't see how you can "bootstrap a smarter model" based on synthetic data from a previous-gen model this way. You may as well well just train your new model on the original training data.

viraptor

9 months ago

It's been already proven possible https://arxiv.org/abs/2203.14465

og_kalu

9 months ago

>Synthetic data can never contain more information than the statistical model from which it is derived: it is simply the evaluation of a non-deterministic function on the model parameters. And the model parameters are simply a function of the training data.

The Information in the data isn't just about the output but its rate of occurrence/distribution. If what your base model has learnt is only enough to have the occasional flash of brilliance say 1 out of 40 responses and you are able to filter out these responses and generate as much as you like then you can very much 'bootstrap a better model' by training on these filtered results. You are only getting a function of the model's parameters if you train on its unfiltered, unaltered output.

antisthenes

9 months ago

Synthetic data without some kind of external validation is garbage.

E.g. you can't just synthetically generate code, something or someone needs to run it and see if it performs the functions you actually asked of it.

You need to feed the LLM output into some kind of formal verification system, and only then add it back to the synthetic training dataset.

Here, for example - dumb recursive training causes model collapse:

https://www.nature.com/articles/s41586-024-07566-y

jneagu

9 months ago

Anecdotally, synthetic data can get good if the generation involves a nugget of human labels/feedback that gets scaled up w/ a generative process.

HPsquared

9 months ago

There are definitely a lot of wrong ways to do it. Doesn't mean the basic idea is unsound.

jneagu

9 months ago

Yeah, There was a reference in a paywalled article a year ago (https://www.theinformation.com/articles/openai-made-an-ai-br...): "Sutskever's breakthrough allowed OpenAI to overcome limitations on obtaining high-quality data to train new models, according to the person with knowledge, a major obstacle for developing next-generation models. The research involved using computer-generated, rather than real-world, data like text or images pulled from the internet to train new models."

I suspect most foundational models are now knowingly trained on at least some synthetic data.

jneagu

9 months ago

Edit: OP had actually qualified their statement to refer to only underrepresented coding languages. That's 100% true - LLM coding performance is super biased in favor of well-represented languages, esp. in public repos.

Interesting - I actually think they perform quite well on code, considering that code has a set of correct answers (unlike most other tasks we use LLMs for on a daily basis). GitHub Copilot had a 30%+ acceptance rate (https://github.blog/news-insights/research/research-quantify...). How often does one accept the first answer that ChatGPT returns?

To answer your first question: new content is still being created in an LLM-assisted way, and a lot of it can be quite good. The rate of that happening is a lot lower than that of LLM-generated spam - this is the concerning part.

generic92034

9 months ago

The OP has qualified "code" with bad availability of samples online. My experience with LLMs on a proprietary language with little online presence confirms their statement. It is not even worth trying, in many cases.

jneagu

9 months ago

Fair point - I actually had parsed OP's sentence differently. I'll edit my comment.

I agree, LLMs performance for coding tasks is super biased in favor of well-represented languages. I think this is what GitHub is trying to solve with custom private models for Copilot, but I expect that to be enterprise only.

MagicMoonlight

9 months ago

They aren’t even regurgitating, they’re just making it up. It may regurgitate something it has seen, or it may invent something which isn’t even a part of the language. It’s down to chance.

jsemrau

9 months ago

Data annotation is a thing that will be a huge business going forward.

mondrian

9 months ago

Curious about this statement, do you mind expanding?

oblio

9 months ago

I'm also curious. For folks who've been around, the semantic web, which was all about data annotation, failed horribly. Nobody wants to do it.

moi2388

9 months ago

Simple. Codebases on GitHub, research papers posted on journals, raw data posted online, phone data, emails..

There is plenty of data even without QA forums.

finolex1

9 months ago

There is still publicly available code and documentation to draw from. As models get smarter and bootstrapped on top of older models, they should need less and less training data. In theory, just providing the grammar for a new programming language should be enough for a sufficiently smart LLM to answer problems in that language.

Unlike freeform writing tasks, coding also has a strong feedback loop (i.e. does the code compile, run successfully, and output a result?), which means it is probably easier to generate synthetic training data for models.

layer8

9 months ago

> In theory, just providing the grammar for a new programming language should be enough for a sufficiently smart LLM to answer problems in that language.

I doubt it. Take a language like Rust or Haskell or even modern Java or Python. Without prolonged experience with the language, you have no idea how the various features interact in practice, what the best practices and typical pitfalls are, what common patterns and habits have been established by its practitioners, and so on. At best, the system would have to simulate building a number of nontrivial systems using the language in order to discover that knowledge, and in the end it would still be like someone locked in a room without knowledge of how the language is actually applied in the real world.

oblio

9 months ago

> sufficiently smart LLM

Cousin of the sufficiently smart compiler? :-p

akomtu

9 months ago

User data is the new gold, at least until AI is good enough to create that gold from iron.

fullstackwife

9 months ago

The answer is already known, and it is a multi billion dollars business: https://news.ycombinator.com/item?id=41680116

insane_dreamer

9 months ago

That's a separate issue. That discussion is about training the model's responses through fairly standard RL (human-generated in this case), not providing it with a new corpus of data from which to draw its responses.

trod123

9 months ago

There are a great many problems with LLM's and AI in general. The underlying problem is, LLMs break the social contract in about as many ways as a human can dream up. There is no useful beneficial purpose that doesn't also open the door to intractable destructive forces manifold over that benefit; its a modern day pandora's box, and its hubris to think otherwise.

This underlying societal problem has knock-on effects just like chilling effects on free speech, but these effects we talk about are much more pernicious.

To understand, you have to know something about volunteer work, and what charity is. Charity is intended to help others who are doing poorly, at little to no personal benefit to you, to improve their circumstances. Its gods work, there is very little expectation involved on the giver. Its not charity if you are forced to give.

When you give something to someone as charity, and then that someone, or an observing third-party related to that charity extorts you and attempts to coerce you for more. What do you think the natural inclination is going to be?

There is now additional coercive personal cost attached to that which was not given freely; what naturally happens. This is theft.

Volunteer psychology says, you stop giving when it starts costing you more than you were willing to give. Those costs may not be monetary, in many cases they are personal, and many people who give often vet those who they are giving to so as to help good people rather than enable evil people. LLM's make no distinction.

Less people give, but that is not all. Less people who are intelligent give anything. They withdraw their support. Where the average person in society was once pulled up to higher forms of thought by those with intelligence and a gift for conveying thought, that vanishes as communication vanishes. From that point forward, the average is pulled only to baser and baser forms of thought and the average people seem like the smartest people in the room though that feeling is flawed (because they aren't that intelligent).

The exchange for charity is often I'll help you here, and you are obligated to use it to get yourself into a better position so you can help both yourself and other people along the way. Paying it forward if you can. Charity is wasted on those that would use what is given to support destructive and evil acts.

There are types of people who will just take until no more can be given which is a type of evil, which is why you need to vet people beforehand but an LLM is simply a programmatic implementation of taking. The entire idea of a robot worker replacing low-mid level jobs during limits of growth is one that seems enticing from the short-sighted producer side except it ends up stalling economic activity. Economies run through an equal exchange between two groups. The moment there is no exchange because money isn't being provided for labor, products can't be bought. Shortages occur, unrest, usually slavery then death when production systems for food break down.

We are already seeing this disruption in jobs in the Tech sector that is 5 times the national unemployment during peak hiring (where offpeak has hiring freezes), in a single year. When you can't keep talent, that talent has options and goes elsewhere, either contributing within the economy or creating their own (black markets). ECP in non-market socialist systems (like what we are about to have in about 5 years), guarantees a lot of death or slavery.

People volunteer to help others, if someone then uses those conversations to create a humunculus that steals that knowledge and gives it to anyone that asks, even potentially terrorist or other evil actors, no one who worked for the expertise will support that.

Inevitably it also means someone will use that same thing to replace those expert jobs entry level portions with a machine that can't reason, and no new candidates can get the experience to become an expert. Its a slow descent into an age of ruin that ends in collapse.

You end up with cyclical chaotic spirals of economic stagnation, followed by shortage, followed by slavery or death from starvation; no AGI needed. All that's needed is imposing arbitrary additional cost on those least able to bear it. Interference in the job market, making it impossible to find work, gender relations (making it impossible to couple up and have babies), political relations (that drive greater government and control at the same time limiting any future). Its dystopian but its becoming a reality because of evil people in positions that allow front-of-line blocking to course correct continue their evil work; and they are characteristically wllfully blind to the destructive acts they perform and the consequences.

There are a lot of low pay jobs that will just vanish, people don't have enough money to raise children as it is; old crowds out the new until finally, a great dying occurs. That is the future if we don't stop it and resolve the changes that were made (as the bridges were burnt) by the current generation in political power. They should have peaked in power in 95, exited in 2005-2010. They still hold the majority and its 2024 (30 years after taking power). They'll die of old age soon, and because they were the only ones that knew all the changes that were made, new bridges will have to be built from scratch with less knowledge, resources, goodwill/trust, and people. They claimed they wanted a better world for their children, while creating a hellscape for them instead made of lies and deceit.

Very few people are taught specifically what the social contract entails. Its well past time people learn, lest they end up being the supporting cause of their own destruction.

Thomas Paine's Rights of Man has a lot of good points on this.

indigo945

9 months ago

> There are types of people who will just take until no more can be given which is a type of evil, which is why you need to vet people beforehand but an LLM is simply a programmatic implementation of taking.

Your first argument largely rests on this, but isn't the fact that so many LLM-created answers get posted actually proof of the opposite? The LLM is, after all, "giving" here - and it's not like those answers are useless, just because they're made by an LLM. LLMs are perfectly capable of giving good answers to a large number of common tech support questions, and are therefore ("voluntarily") helping people.

> We are already seeing this disruption in jobs in the Tech sector that is 5 times the national unemployment during peak hiring (where offpeak has hiring freezes), in a single year. [...] Inevitably it also means someone will use that same thing to replace those expert jobs entry level portions with a machine that can't reason, and no new candidates can get the experience to become an expert. Its a slow descent into an age of ruin that ends in collapse.

This is your other argument, which is slightly more solid - I think that at least in the short term, we are going to see many of those effects. But overall, this dystopian "collapse" scenario is economically naive, for at least two reasons:

1. As it becomes harder to enter professions requiring high degrees of specializiation as the entry-level positions dry up, the demand for real specialists in these fields increases accordingly. At some point, it becomes viable for new talent to enter those sectors even in the face of a low number of "true" entry-level positions, as there is now an incentive to accept a much steeper upfront cost in terms of time invested on learning the specialization. If there are jobs that can at all be done and that pay 500,000$ per year, somebody is going to do them.

2. And I mean, besides, the entire "ECP in non-market socialist systems [...] guarantees a lot of death or slavery" argument is plain wrong. ("Pure ideology! *sniffs*"). We already live in a post-market capitalist society. Amazon controls enough market share in a large number of sectors that their operating model resembles Soviet-style demand planning when calculating how much produce to order (that is, order to manufacture). The Economic Calculation Problem, in the age of computers, is solved. We are not playing with abacuses anymore.

BlueTemplar

9 months ago

Why would you trust Amazon ?

At least the USSR's ideology claimed to represent the worker's interest, and so once that stopped happening in practice, the system collapsed. (And they were already in the age of computers, trying to do some of the first cybernetics.)

indigo945

9 months ago

I would not dare say I trust Amazon; however, our lived experience of ordering from there entails neither death nor slavery.

trod123

9 months ago

If the latter were true, then we would have seen the demand increase for all those specialized fields from graduates, but we have seen its just like what it was, no jobs and more debt, Older people fulfilling roles, until they retire or die and positions that aren't getting filled despite advertising for months to years. My generation will be dead within that time period, you should see at least some replacement by now but if you look closely, there isn't much replacement going on at all because there's no money in it. Greybeards are the only minders left running many of these IT departments if they are left at all and not completely outsourced.

For the amount of responsibility to get things right, you now have jobs that have significantly less responsibilities or entry skills that are now competing with the same positions thanks to inflation. Who would bother signing up for that stress for no economic benefit when they can get any number of other jobs at 3/4 of the pay and none of the responsibility. This type of chaos is what you've discounted as not happening (ECP).

> The economic calculation problem ... is solved

Then why are we having chaotic disruptions, in growing frequency and amplitude, continually more concentrated business sectors based in Lenin's comments on attacking capitalism through debauching currency through inflation as he mentioned to Keynes (towards non-market socialism), and more importantly shortages that are starting to self-sustain at the individual level.

When common food goods are not getting replenished on a weekly basis, and they are 2-3x the normal price, there is an issue. When it hits 2-3 weeks it starts to compound as demand skyrockets for non-discretionary goods (and they can't keep up). Goods sell out almost immediately (as they have been more recently), and people start hoarding (which they are doing again with TP).

You don't seem to realize this is the problem in action. It is hardly solved, we aren't even to the non-market part yet and we are far enough along to see that chaos is increasing, distortions are increasing, and shortages are increasing and not being corrected within JIT shipping schedules.

Those three observations are cause for concern to stop and lookaround because chaos is a major characteristic of the ECP. Moving goods around doesn't perform economic calculation, and soviet-style demand planning failed historically once shortages occurred, the next things coming up in the near future is sustaining shortage and death from famine. India stopped exporting its food because they see this on the horizon and have a intimate history with famine.

The 1921 famine (bolshevik non-market) reduced national population by 4% YOY and dropped the birthrate negative; Mao killed off more than 33+ million during his non-market stint which caused famine. No accurate numbers to draw up a percentage YOY but it was significant.

Every single non-market socialist state has failed in the history of socialism, and that includes its derivatives. Its not appropriate to call it ideology when there is a historic record where not even a single non-market state exists today. The fact that they all had to have markets tied to capitalist states to get their prices shows the gravity of the issue, and now that the last capitalist state is falling to non-market socialism (through intentional design/sabotage), what do you suppose is going to happen to all those market-socialist countries that can no longer get their price data. Its already happening, and chaos is unpredictable; and inflation/deflation measures are limited in resolution (as a high hysteresis problem).

Non-market socialist states run into the slavery and death part, but you are not recognizing that, and discounting it because it hasn't happened to us yet (though its likely within 5 years for a point of no return), and it has happened to every non-market system.

The point of no return to non-market socialism is when debt growth exceeds GDP (outflows exceed inflows). This is structural of any ponzi scheme. Who decides, the producers. This is when rational producers abandon the market, and start shutting down because its no longer possible to make a profit in terms of purchasing power. From there its a short but inevitable jaunt to collapse.

Have you noticed that China's stimulus through printing isn't doing much? Every printing requires exponentially greater amounts, with diminishing returns.

I think you've sorely confused objective observation with ideology and discounted it without taking into account the factors rationally; these are factors showing a clear and present danger to all existing market-socialist states, as well as those capitalist bastions that will be pulled down in the near future from ongoing subversion.

Amazon doesn't produce, its a logistics company. Shortages self-sustain because producers are leaving the market, we saw this first with baby formula in 2020 when the last producer was closed, but what's coming will leave only state-run capacity for most sectors and industry.

Most knowledgeable people know these systems are brittle and parasitic and aren't capable of ECP for a myriad of already written and rational reasons by themselves.

Socialist production absent a market, doesn't work without slave labor. It doesn't even work longer-term either with slave labor because of inefficiencies at current population don't scale well with tech levels.

You might be fine claiming everything is fine now, despite a lot of evidence to the contrary that things are getting progressively worse, and will continue down that path because of these consequences, but what about in 5 years when you can't get food. Will it matter then, when you can't do anything about it, because the time to do something came and passed?

You think the Elites in China don't know this is going to happen? They are pushing so hard for the BRICS because they know this is going to happen, because they helped cause it in their dogmatic and blind fervor and orchestration.

Name one modern real non-market socialist state that exists today that has not failed, and I'll be willing to consider your last statement. As far as I know, none exist and that should frighten every single person today because that is where we are going whether we like it or not (as a function of system dynamics).

indigo945

9 months ago

> Name one modern real non-market socialist state that exists today that has not failed, and I'll be willing to consider your last statement.

Your focus on the false dichotomy between state action and private action makes me want to go tongue-in-cheek and name France. Definitely a non-market socialist state.

And for the record, I do agree that everything will get progressively worse, for ideological and for material reasons. But in both cases for different reasons than you suggest. The unholy marriage of extreme concentrations of wealth and the reemergence of fascism has been blessed with many children, but the Musks, Murdochs, Kochs and Thiels of this world will have little recourse against an economic reality wrecked by climate change, the drying up of fossil fuels and the bitter demographic outlook.

trod123

9 months ago

While Frances debt to GDP as a single country has reached above that threshold, the driver of statism is the primary currency they use, and they pooled their primary currency with the eurozone (i.e. the euro), and the aggregated country debt ratio for the Euro is still only around 88% largely because they have tied it indirectly to the USD.

Still though, you see high concentration, loss of jobs, high unemployment, all classic signs often seen in socialist states right before major calamity or outright failure.

I'd say though the France won't have gotten to non-market socialism until their primary underlying currency hits those milestones, but they will see the chaos much sooner when the shoe does drop.

whazor

9 months ago

luckily LLMs do not upvote SO/Reddit answers, which is still done by humans.

epgui

9 months ago

In a very real sense, that’s also how human brains work.

elicksaur

9 months ago

This argument always conflates simple processes with complex ones. Humans can work with abstract concepts at a level LLMs currently can’t and don’t seem likely capable of. “True” and “False” are the best examples.

epgui

9 months ago

It doesn’t conflate anything though. It points to exactly that as a main difference (along with comparative functional neuroanatomy).

It’s helpful to realize the ways in which we do work the same way as AI, because it gives us perspective unto ourselves.

(I don’t follow regarding your true and false statement, and I don’t share your apparent pessimism about the fundamental limits of AI.)

okoma

9 months ago

The authors claim that LLM are reducing public knowledge sharing and that the effect is not merely displacing duplicate, low-quality, or beginner-level content.

However their claim is weak and the effect is not quite as sensational as they make it sound.

First, they only present Figure 3 and not regression results for their suggested tests of LLMs being substitutes of bad quality posts. In contrast, they report tests for their random qualification by user experience (where someone is experienced if they posted 10 times). Now, why would they omit tests by post quality but show results by a random bucketing of user “experience”?

Second, their own Figure 3 “shows” a change in trends for good and neutral questions. Good questions were downtrending and now they are flat, and neutral questions (arguably the noise) went from an uptrend to flat. Bad question continue to go down, no visible change in the trend. This suggests the opposite, ie that LLMs are in fact substituting bad quality content.

I feel the conclusion needed a stronger statement and research doesn’t reward meticulous but unsurprising results. Hence the sensational title and the somewhat redacted results.

BolexNOLA

9 months ago

While this article doesn’t really seem to be hitting what I am about to say, I think someone on HN a while back described a related phenomenon (which leads to the same issue) really well. The Internet is Balkanizing. This is hardly a new concept but they were drilling down specifically into online communities.

People are electing to not freely share information on public forums like they used to. They are retreating into discord and other services where they can put down motes and raise the draw bridges. And who can blame them? So many forums and social media sites and forums are engaging in increasingly hostile design and monetization processes, AI/LLM’s are crawling everywhere vacuuming up everything then putting them behind paywalls and ruining the original sources’ abilities to be found in search, algorithms designed to create engagement foster vitriol and controversy, the list goes on. HN is a rare exception these days.

So what happens? A bunch of people with niche interests or knowledge sets congregate into private communities and only talk to each other. Which makes it harder for new people to join. It’s a sad state of affairs if you ask me.

Simran-B

9 months ago

Yes, it's sad. On the other hand, I think it's a good thing that people share knowledge less, publicly and free of charge on the web, because there is so much exploitation going on. Big corporations obviously capitalize on the good will of people with their LLMs, but there are also others who take advantage of the ones who want to help. A lot of users seemingly expect others to solve their problems for free and don't even put any effort into asking their questions. It's a massive drain for energy and enthusiasm, some even suffer from burnout (I assume more in open-source projects than on SO but still). I rather want it to be harder to connect with people sharing the same passion "in private" than having outsider who don't contribute anything profit off of activities happening in the open. This frustratingly appears to become the main reason for corporate open source these days.

BolexNOLA

9 months ago

I honestly don’t agree with a lot of this. I don’t think it’s a good thing at all for people to stop publicly and freely sharing knowledge. I’ve always found the US library system to be an example of how foundational and important knowledge is: so important that it should be free to everybody.

BlueTemplar

9 months ago

You mentioned forums twice, but what (non-platform) forums do you know that "engag[ed] in increasingly hostile design and monetization processes" ?

BolexNOLA

9 months ago

Reddit is a pretty obvious example. While not a “traditional forum” it certainly fills that role for hundreds of millions of people. Discord is also making questionable decisions but the relative autonomy people have gives them more tolerance for it.

Yacovlewis

9 months ago

What if LLMs are effective enough at assisting coders that they're spending less time on SO and instead pushing more open source code, which is more valuable for everyone?

BolexNOLA

9 months ago

Being able (or theoretically able!) to produce more of a thing per day has never made anyone’s job easier. It just means they’re expected to meet that new output level or some unreasonable level above that.

rkncland

9 months ago

Of course people reduce their free contributions to Stackoverflow. Stackoverflow is selling then out with the OpenAI API agreement and countless "AI" hype blog posts.

kertoip_1

9 months ago

I don't think it's the main reason. People don't care whether someone is selling stuff they create on a platform. Big social media has been doing it for many years now e.g. Facebook and yet it's still there. You come to SO for answers, why would you care that someone is teaching some LLM on them later?

pessimizer

9 months ago

> You come to SO for answers, why would you care that someone is teaching some LLM on them later?

This doesn't make the slightest bit of sense. The people who would be concerned are the ones who are providing answers. They are not coming to SO solely to get answers.

jeremyjh

9 months ago

I think this is more about a drop in questions, than a drop in answers.

bryanrasmussen

9 months ago

I mean part of the reason to not ask about stuff on SO, there are several types of questions that one might like to ask - such as:

I don't know the first thing about this thing, help me get to where I know the first thing. This is not allowed any more.

I want to know the pros and cons of various things compared. this is not allowed.

I have quality questions regarding an approach that I know how to do, but I want to know better ways. This is generally not allowed but you might slip through if you ask it just right.

I pretty much know really well what I'm doing but having some difficulty finding the right documentation on some little thing,help me - this is allowe

Something does not work as per the documentation, help me, this is allowed

I think I have done everything right but it is not working, this is allowed and is generally a typo or something that you have put in the wrong order because you're tired.

At any rate, the ones that are not allowed are the only questions that are worth asking.

The last two that is allowed I generally find gets answered in the asking - I'm pretty good in the field I'm asking in, the rigor of making something match SO question requirements leads me to the answer.

If I ask one of the interesting disallowed questions and get shit on then I will probably go through a period of screw it, I will just look extra hard for the documentation before I bother with that site again.

jakub_g

9 months ago

I can see how frustrating it might be, but the overall idea of SO is "no duplicates". They don't want to have 1000 questions which are exactly the same but with slightly different phrasing. It can be problematic for total newcomers, but at the same time it makes it more useful for professionals: instead of having 1000 questions how to X with 1 reply, you have one canonical question with 20 replies sorted by upvotes and you can quickly see which one is likely the best.

FWIW, I found LLMs to be actually really good at those basic questions where I'm at expert at language X and I ask how to do similar thing in Y, using Y's terms (which might be named differently in X).

I believe this actually would work well:

- extra basic things, or things that depend on opinion etc: ask LLMs and let they infer and steer you

- advanced / off the beaten path questions that LLMs hallucinate on: ask on SO

noirscape

9 months ago

The problem SO tends to run into is when you have a question that seems like it answers another question on the surface (ie. because the question title is bad) and then a very different question is closed with the dupe reason pointing to that question because the close titles are similar.

Since there's no way to appeal duplicate close votes on SO until you have a pretty large amount of rep, this kinda creates a problem where there's a "silent mass" of duplicate questions that aren't really duplicates.

A basic example is this question: https://stackoverflow.com/q/27957454 , which is about disabling PRs on GitHub on the surface. The body text however reveals that the poster is instead asking how they can set up branch permissions and get certain accounts to bypass them.

I can already assure you that just by basic searching, this question will pop up first when you look up disabling PRs, and the accepted answer answers the question body (which means that it's almost certain a different question has been closed as a duplicate of this one), rather than the question title. You could give a more informative answer (which kinda happened here), but this is technically off-topic to the question being closed.

That's where SO gets it's bad rep for inaccurate duplicate closing from.

nicolas_t

9 months ago

The other problem is that you'll get a duplicate but the duplicate question's answers are all outdated.

bryanrasmussen

9 months ago

>I can see how frustrating it might be

It's certainly not frustrating for me, I ask a question maybe once a year on SO, most of their content is, in my chosen disciplines, not technically interesting, it is no better than looking up code snippets in documentation (which most of the time is what it really, really is)

I suppose it's frustrating for SO that people no longer find it worthwhile to ask questions there.

>advanced / off the beaten path

show me an advanced and off the beaten path question that SO has answered well, that is just not worth the effort to try to get an answer - if you have an advanced and off the beaten path question that you can't answer then you ask it on SO just "in case" but really you will find the answer somewhere else or not at all in my experience.

SoftTalker

9 months ago

The first one especially is not interesting except to the person asking the question, who wants to be spoon-fed answers instead of making any effort of his own to acquire foundational knowledge. Often these are students asking for someone to solve their homework problems.

Pro/Con questions are too likely to involve opinion and degenerate into flamewars. Some could be answered factually, but mostly are not. Others have no clear answers.

bryanrasmussen

9 months ago

thank you for bringing the default SO reasons why these are not the case, but first off

>Often these are students asking for someone to solve their homework problems.

I don't think I've been in any class since elementary school in which I did not have foundational knowledge, I'm talking "I just realized there must be a technical discipline that handles this issue and I can't google my way to it level of questions."

If I'm a student, I have a textbook and the ability to read. I'm not asking textbook readable or relevant literature readable in the thing I am studying questions because I, being in a class on the subject I would "know the first thing" to quote my earlier post, that first thing being how to get more good and relevant knowledge on the thing I am in a class in.

I'm talking about things you don't even know what questions to ask to get that foundational knowledge which is among the most interesting questions to ask - the problem with SO is it only wants me to ask questions in a field in which I am already fairly expert but I have just hit a temporary stumbling block for some reason.

I remember when I was working on a big government security project and there was a Java guy who was an expert in a field that I knew nothing about and he would laugh and say you can't go to SO and ask about how do I ... long bit of technical jargon outside my field that I sort of understood hung together, maybe eigenvectors came up (this was in 2013)

Second thing, yes I know SO does not want people to ask non-factual questions, and it does not want me to ask questions in fields in which I am uninformed, so it follows it wants me to ask questions that I can probably find out myself one way or another, only SO is more convenient.

I gave some reasons why I do not find SO particularly convenient or useful given their constraints implying this is probably the same for others, you said two of my reasons were no good, but I notice you did not have any input on the underlying question of - why are people not asking as many questions on SO as they once did?

Ferret7446

9 months ago

> I don't think I've been in any class since elementary school in which I did not have foundational knowledge

> If I'm a student, I have a textbook and the ability to read

You are such an outlier that I don't think you have the awareness to make any useful observations on this topic. Quite a lot of students in the US are now starting to lack the ability to read, horrifyingly (and it was never 100%), and using ChatGPT to do homework is common.

hackable_sand

9 months ago

How is this a useful observation?

You went blind in one eye to see out the other.

SoftTalker

9 months ago

SO is what it is, they have made the choices they made as to what questions are appropriate on their platform.

I don't know why SO questions are declining -- perhaps people find SO frustrating, as you seem to, and they give up. I myself have never posted a question on SO as I generally have found that my questions had already been asked and answered. And lately, perhaps LLMs are providing better avenues for the sorts of questions you describe. That seems very plausible to me.

Izkata

9 months ago

> I don't know the first thing about this thing, help me get to where I know the first thing. This is not allowed any more.

This may have been allowed in like the first year while figuring out what kind of moderation worked, but it hasn't been as least since I started using it in like 2011. They just kept slipping through the cracks because so many questions are constantly being posted.

Ferret7446

9 months ago

The problem is that SO is not a Q&A site although it calls itself that (which is admittedly misleading). It is a community edited knowledgebase, basically a wiki, where the content is Q&As. It just so happens that one method of contributing to the site is by writing questions for other people to write answers to.

If you ask a question (i.e., add content to the wiki) that is not in scope, then of course it will get removed.

cowsandmilk

9 months ago

The time period of their analysis is through May 2023, which was a full year before their OpenAI agreement. The agreement is irrelevant to the findings of the article.

zahlman

9 months ago

This is on top of an existing long-term trend (since 2014 or so) of increasing frustration at the quality and nature of questions (failure to adhere to site guidelines, nor indeed to put effort into understanding how the site is intended to work).

verdverm

9 months ago

For me, many of my questions about open source projects have moved to GitHub and Discord, so there is platform migration besides LLMs. I also tend to start with Gemini for more general programming things, because it will (1) answer in the terms of my problem instead of me having to visit multiple pages to piece it together, or (2) what it's wrong, I often get better jump off points when searching. Either way, LLMs save me time instead of having to click through to SO multiple times because the title is close but the content as an important difference

baq

9 months ago

2022: Discord is not indexed by search engines, it sucks

2024: Discord is not indexed by AI slop generators, it's great

verdverm

9 months ago

It's more that Discord is replacing Slack as the place where community happens. Less about about indexing, which still sucks even in Discord search. Slack/Salesforce threw a lot of small projects under the bus, post-acquisition, with the reductions to history from message count to 90 days

throwaway918299

9 months ago

Discord stores trillions of messages. If they haven’t figured out how to make a slop generator out of it yet, I’m sure it’s coming soon.

joshdavham

9 months ago

> many of my questions about open source projects have moved to GitHub and Discord

Exact same experience here. Plus, being able to talk to maintainers directly has been great!

kertoip_1

9 months ago

Both of those platforms are making answers harder to find. For me, a person used to getting the correct answer in Stackoverflow right away, scrolling through endless GitHub discussions is a nightmare. Aren't we just moving backwards?

verdverm

9 months ago

SO lost it's appeal, can't say exactly why, but it often has outdated content. I cannot say I get the correct answer there right away.

Each platform fills different role, GitHub is project specific and access to maintainers, who do not monitor SO. Discord is chat, with maintainers and community. SO misses the connection to the people building and using the software, it's more like the Quora of software nowadays. That Discord and GitHub have terrible search is not detracting from their other benefits

klabb3

9 months ago

No doubt that discord has struck a good balance. Much better than GitHub imo. Both for maintainers to get a soft understanding of their users, and equally beneficial for users who can interact casually without being shamed for filing an issue the wrong way.

There’s some weird blind spot with techies who are unable to see the appeal. UX matters in a “the medium is the message”-kind of way. Also, GitHub is only marginally more open than discord. It’s indexable at the moment, yes, but would not surprise me at all if MS is gonna make an offensive move to protect “their” (read our) data from AI competitors.

verdverm

9 months ago

Chat is an important medium, especially as new generations of developers enter the field (they are more chat native). It certainly offers a more comfortable, or appropriate place, to ask beginner questions, or have quick back-n-forths, than GitHub issues/discussions offers. I've always wondered why GH didn't incorporate chat, seems like a big missed opportunity.

joshdavham

9 months ago

> I've always wondered why GH didn't incorporate chat

I've been wondering the same thing recently. It's really inefficient for me to communicate with my fellow maintainers through Github discussions, issues and pull request conversations so my go-to has been private discord conversations. This is actually kind of inefficent since most open source repos will always have a bigger community on github vs on discord (not to mention that it's a hassle when some maintainers are Chinese and don't have access to Discord...)

BlueTemplar

9 months ago

"New generations" being "younger than 45" ?

lizknope

9 months ago

I subscribe to a bunch of technical subreddits and over the last 2 years I've seen so many questions spammed to multiple subreddits. The account is new or every reply seems to be generic 1 line responses that could be autogenerated. I assume they are bot accounts to train an AI. Now before I write a longer technical explanation I look to determine if this is a real human asking and only then will I bother to reply.

acuozzo

9 months ago

You're helping to train them regardless.

I reckon that, ultimately, the success of "gift culture", "kNoWlEdGe ShOuLd Be Free!!!1", F/OSS, et al. via the WWW will cast this entire Stallman-esque hacker ethos in a bad light.

We all work for IBM^H^H^HOpenAI, but there's nothing like the GPL to back us up now!

gwern

9 months ago

Note for those suffering from deja vu: this was extensively highlighted and touted by critics back in July 2023 as "Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow" https://arxiv.org/abs/2307.07367 including HN: https://news.ycombinator.com/item?id=36763718

zahlman

9 months ago

Worth noting Stack Overflow's own meta discussion of that: https://meta.stackoverflow.com/questions/425651 ; relatedly https://meta.stackoverflow.com/questions/425635 and https://meta.stackoverflow.com/questions/422392.

atomic128

9 months ago

Eventually, large language models will be the end of open source. That's ok, just accept it.

Large language models are used to aggregate and interpolate intellectual property.

This is performed with no acknowledgement of authorship or lineage, with no attribution or citation.

In effect, the intellectual property used to train such models becomes anonymous common property.

The social rewards (e.g., credit, respect) that often motivate open source work are undermined.

That's how it ends.

zmgsabst

9 months ago

Why wouldn’t you use LLMs to write even more open source?

The cost of contributions falls dramatically, eg, $100 is 200M tokens of GPT-3.5; so you’re talking enough to spend 10,000 tokens developing each line of a 20kloc project (amortized).

That’s a moderate project for a single donation and an afternoon of managing a workflow framework.

atomic128

9 months ago

What you're describing is "open slop", and yes, there will be a lot of it.

Open source as we know it today, not so much.

gspr

9 months ago

I don't understand this take.

If LLMs will be the end of open source, then they will constitute that end for exactly the reason you write:

> Large language models are used to aggregate and interpolate intellectual property.

> This is performed with no acknowledgement of authorship or lineage, with no attribution or citation.

> In effect, the intellectual property used to train such models becomes anonymous common property.

And if those things are true and allowed to continue, then any IP relying on copyright is equally threatened. That could of course be the case, but it's hardly unique to open source. Open source is no different, here. Or are you suggesting that non-open-source copyrighted material (code or otherwise) is protected by keeping the "source" (or equivalent) secret? Good luck making money on that blockbuster movie if you don't dare show it to anyone, or that novel if you don't dare let people read it.

> The social rewards (e.g., credit, respect) that often motivate open source work are undermined.

First of all: Those aren't the only social rewards that motivate open source work. I'd even wager they aren't the most common motivators. Those rewards seem more like the image that actors that try to social-network-ify or gamify open source work want to paint.

Second: Why would those things go away? The artistic joy that drives a portrait painter didn't go away when the camera was invented. Sure, the pure monetary drive might suffer, but that drive is perhaps the drive that's least specific to open source work.

A4ET8a8uTh0

9 months ago

<< Why would those things go away?

I think that is because, overall, the human nature does not change that much.

<< Open source is no different, here. Or are you suggesting that non-open-source copyrighted material (code or otherwise) is protected by keeping the "source" (or equivalent) secret? Good luck making money on that blockbuster movie if you don't dare show it to anyone, or that novel if you don't dare let people read it.

You may be conflating several different media types and we don't even know what the lawsuit tea leaves will tell us about that kind of visual/audio IP. As far as code goes, I think most companies have already shown how they protect themselves from 'open' source code.

yapyap

9 months ago

no it won’t, it’ll just make it more niche than it already is.

atomic128

9 months ago

LLM users are feeding their entropy into the model, and paying for the privilege.

These LLM users produce the new training data. They are being assimilated into the tool.

This is the future of "open source": Anonymous common property continuously harvested from, and distributed to, LLM users.

jmyeet

9 months ago

It's a losing battle to try and maintain walled gardens for these corpuses of human-generated text that have become valuable to train LLMs. The horse has probably already bolted.

I see this as a temporary problem however because LLMs are transitional. At some point it won't be necessary to train an LLM on the entirety of Reddit plus everything else ever written because there are obvious limits to statistical models like this and, as a counter point, that's not how humans learn. You may have read hundres of books in your life, maybe even thousands. You haven't read a million. You don't need to.

I find it interesting that this issue (which is theft, to be clear) is being framed as theft from the site or company that "owns" that data, rather than theft from the users who created it. All these user-generated content ("UGC") sites are doomed to eventually fail because their motivations diverge from their users and the endless quest to increase profits inevitably drives users away.

Another issue is how much IP consumption constitutes theft? If an LLM watches every movie ever made, that's probably theft. But how many is too many? Like Apocalypse Now was loosely based on or at least inspired by Heart of Darkness (the novel). Yet you can't accuse a human of "theft" by reading Heart of Darkness.

All art is derivative, as they say.

vlovich123

9 months ago

> At some point it won't be necessary to train an LLM on the entirety of Reddit plus everything else ever written because there are obvious limits to statistical models like this and, as a counter point, that's not how humans learn. You may have read hundres of books in your life, maybe even thousands. You haven't read a million. You don't need to.

I agree but I think it may be privileging the human intelligence mechanism a bit too much. These LLMs are polymaths that can spit out content at a super human rate. It can generate poetry and literature similarly to code and answers about physics and car repair. It’s very rare for a human to be able to do that especially these days.

So I agree they’re transitional but only in the sense that our brains are transitional from the basal ganglia to the neocortex. In that sense I think LLMs will probably be a part of a future GAI brain with other things tracked on, but it’s not clear it will necessarily evolve to work like a human’s brain does.

jprete

9 months ago

I think the actual reason people can't do it is that we avoid situations with high risk and no apparent reward. And we aren't sufficiently supportive of other people doing surprising things (so there's no reward for trying). I.e. it's a modern culture problem, not a human brain problem.

jmyeet

9 months ago

> These LLMs are polymaths that can spit out content at a super human rate.

Do you mean in theory or currently? Because currently, LLMs make simple errors (eg [1]) and are more capable of spitting out, well, nonsense. I think it's safe to say we're a long way from LLMs producing anything creatively good.

I'll put it this way: you won't be getting The Godfather from LLMs anytime soon but you can probably get an industrial film with generic music that tells you how to safely handle solvents, maybe.

Computers are generally good at doing math but LLMs generally aren't [2] and that really demonstrates the weaknesses in this statistical approach. ChatGPT (as one example) doesn't understand what numbers are or how to multiply them. It relies seeing similar answers to derive a likely answer so it often gets the first and large digits of the answer correct but not the middle. You can't keep scaling the input data to have it see every possible math question. That's just not practical.

Now multiplying two large numbers is a solvable problem. Counting Rs in strawberry is a solvable problem. But statistical LLMs are going to have a massive long tail of these problems. It's really going to take the next generational change to make progress.

[1]: https://www.inc.com/kit-eaton/how-many-rs-in-strawberry-this...

[2]: https://www.reachcapital.com/2024/07/16/why-llms-are-bad-at-...

simonw

9 months ago

Both the "count the Rs in strawberry" and the "multiply two large numbers" things have been solved for over a year now by the tool usage pattern: give an LLM the ability to delegate to a code execution environment for things it's inherently bad at and train it how to identify when to use that option.

vlovich123

9 months ago

I think the point is that playing whack a mole is an effective practical strategy to shore up individual weaknesses (or even classes of weaknesses) but that doesn’t get you to general reasoning unless you think that intelligence evolved this way. Given the adaptability of intelligence across the animal kingdom to novel environments never seen before, I don’t think that can be anything other than a short term strategy for AGI.

simonw

9 months ago

Sure, LLMs won't ever get to general reasoning (for pick your definition of "reasoning") unassisted.

I think that adding different forms of assistance remains the most interesting pattern right now.

shagie

9 months ago

(I did an earlier attempt at this with a "ok, longer conversation" ... and then did a "well, what if I just asked it directly?")

https://chatgpt.com/share/670bfdbd-8624-8011-bc31-2ba66eab3e...

I didn't realize that it had come that far with the delegating of those problems to the code writing and executing part of itself.

vlovich123

9 months ago

I think we’re in agreement. It’s going to take next generation architecture to address the flaws where the LLM often can’t even correct its mistake when it’s pointed out as with the strawberry example.

I still think transformers and LLMs will likely remain as some component within that next gen architecture vs something completely alien.

0x1ceb00da

9 months ago

> You may have read hundres of books in your life, maybe even thousands. You haven't read a million. You don't need to.

Sometimes online forums are the only place where you can find solutions for niche situations and edge cases. Tricks which would have been very difficult to figure out on your own. LLMs can train on the official documentation of tools l/libraries but they can't experiment and figure out solutions to weird problems which are unfortunately very common in tech industry. If people stop sharing such solutions with others, it might become a big problem.

simonw

9 months ago

"LLMs can train on the official documentation of tools l/libraries but they can't experiment and figure out solutions to weird problems"

LLMs train on way more than just the official documentation: they train on the code itself, the unit tests for that code (which, for well written projects, cover all sorts of undocumented edge-based) and - for popular projects - thousands of examples of that library being used (and unit tested) "in the wild".

This is why LLMs are so effective at helping figure out edge-cases for widely used libraries.

The best coding LLMs are also trained on additional custom examples written by humans who were paid to build proprietary training data for those LLMs.

I suspect they are increasingly trained on artificially created examples which have been validated (to a certain extent) through executing that code before adding it to the training data. That's a unique advantage for code - it's a lot harder to "verify" non-code generated prose since you can't execute that and see if you get an error.

0x1ceb00da

9 months ago

> they train on the code itself, the unit tests for that code

If understanding the code was enough, we wouldn't have any bugs or counterintuitive behaviors.

> and - for popular projects - thousands of examples of that library being used (and unit tested) "in the wild".

If people stopped contributing to forums, we won't have any such data for new things that are being made.

simonw

9 months ago

The examples I'm talking about come from openly licensed code in sources like GitHub, not from StackOverflow.

I would argue that code in GitHub is much more useful, because it's presented in the context of a larger application and is also more likely to work.

skydhash

9 months ago

> Sometimes online forums are the only place where you can find solutions for niche situations and edge cases.

That's the most valuable aspect of it. When you find yourself in these niches situations, it's nice when you see someone has encountered it and has done the legwork to solve it, saving you hours and days. And that's why Wikis like the Arch Wiki are important. You need people to document the system, not just individual components.

genr8

9 months ago

the Gentoo Wiki is also high on my list, actually first because Gentoo is my preferred choice, equal in terms of useful content, only second in userbase - to Arch. Gentoo also implemented a policy prohibiting people from contributing LLM output. Theres not a culty rule-following edit-sanitization mod team, and the wiki doesnt require special permission, just the honor system, but so far its enough to make me like contributing to it is a form of charity to help future mankind.

falcor84

9 months ago

> that's not how humans learn

I've been thinking about this a lot lately. Could we train an AI, e.g. using RL and GAN, where it gets an IT task to perform based on a body of documentation, such that its fitness would then be measured based on both direct success on the task, and on the creation of new (distilled and better written) documentation that would allow an otherwise context-less copy of itself to do well on the task?

jumping_frog

9 months ago

Just to add to your point, consider a book like "Finite and Infinite" games. I think I can "recreate" the knowledge and main thesis in the book by my readings from other areas.

'Listening to different spiritual gurus saying the same thing using different words' is like 'watching the same coloured glass pieces getting rearranged to form new patterns in kaleidoscope'

szundi

9 months ago

Only half true as maybe reasoning and actual understanding is not the strength of LLMs but it is fascinating that they actually can produce good info from everything they have read - unlike me who only read a fraction of that. Maybe dumb, but good memory.

So I think future AI has to read also everything if it is used like ChatGPT these days by average people to ask for advice about almost anything.

airstrike

9 months ago

> (which is theft, to be clear)

> Another issue is how much IP consumption constitutes theft? If an LLM watches every movie ever made, that's probably theft.

It's hard to reconcile those two views, and I don't think theft is defined by "how much" is being stolen.

fforflo

9 months ago

Well, we know we'll have reached AGI when LLM says "this chat has been marked as duplicate"

Terretta

9 months ago

People don't post where they aren't visiting.

They're not visiting Stack Overflow for well-read material (the popular languages) because perplexity.ai, ChatGPT, Claude, etc., not only answer questions better than reading pages of StackOverflow, but will cut and paste the (right or wrong) answer for you faster.

If you're not on StackOverflow while asking questions, you aren't answering things there either. Nothing else is needed to explain their observations.

Of course, this means that to compete, StackOverflow and other Q&A forums must bring the answering usability (integrating answers into one's flow) to the top.

zahlman

9 months ago

Competing with AI is an explicit non-goal of the Stack Overflow community, as well as the one other Q&A forum I use (Codidact).

AI won't "answer questions better". It will cut out the middleman of parsing your question and matching it up with words in the shape of an answer. It will also frequently hallucinate, and essentially never sanity-check what you're trying to do.

But the main thing giving it a speed/convenience advantage over Q&A forums is that it doesn't give a damn about whether your question, and its answer, could help someone else (by being discoverable with a search engine and understood by that other person as being the same question, and being focused on a single issue). Aside from not being designed to care, it wouldn't benefit: it can just re-generate the same answer content for the next person who asks, in a different low-quality way. Unlike a human expert, AI will never tire of that task.

og_kalu

9 months ago

It definitely answers certain questions better and languages, topics these models are less capable at are hit less. The denial at this point doesn't make any sense.

MASNeo

9 months ago

Wondering about wider implications. If technical interactions reduce online, how about RL and how do we rate a human competence against an AI once society gets a habit from asking an AI first? Will we start to constantly question human advice or responses and what does that do to the human condition.

I am active in a few specialized fields and already I have to defined my advice against poorly crafted prompt responses.

VancouverMan

9 months ago

> Will we start to constantly question human advice or responses and what does that do to the human condition.

I'm surprised when people don't already engage in questioning like that.

I've had to be doing it for decades at this point.

Much of the worst advice and information I've ever received has come from expensive human so-called "professionals" and "experts" like doctors, accountants, lawyers, financial advisors, professors, journalists, mechanics, and so on.

I now assume that anything such "experts" tell me is wrong, and too often that ends up being true.

Sourcing information and advice from a larger pool of online knowledge, even if the sources may be deemed "amateur" or "hobbyist" or "unreliable", has generally provided me with far better results and outcomes.

If an LLM is built upon a wide base of source information, I'm inclined to trust what it generates more than what a single human "professional" or "expert" says.

MASNeo

9 months ago

If an LLM is built upon a wide base of source information, I'm inclined to trust what it generates more than what a single human "professional" or "expert" says. --------- That, and if the prompt to the LLM has been made with a minimum of thought you may get a reasonable answer, perhaps a better one. The consensus is powerful but the two limitations "wide base" and "thoughtful prompt" are big limitations because for specialised fields these limitations apply often. So I am surprise people are inclined to believe the machine more than the human.

wizzwizz4

9 months ago

These systems behave more like individual experts than like the internet – except they're more likely to be wrong than an expert is.

toofy

9 months ago

does this mean you trust complete randoms just as much?

if i need advice on repairing a weird unique metal piece on a 1959 corvette, im going to trust the advice of an expert in classic corvettes way before i trust the advice of my barber who knows nothing about cars but confidently tells me to check the tire pressure.

this “oh no, experts have be wrong before” we see so much is wild to me. in nuanced fields i’ll take the advice of experts any day of the week waaaaaay before i take the advice from someone who’s entire knowledge of topic comes from a couple twitter post and a couple of youtube’s but their rhetoric sounds confident. confidently wrong dipshits and sophists are one of the plagues of the modern internet.

in complex nuanced subjects are experts wrong sometimes? absofuckinlutely. in complex nuanced subjects are they correct more often than random “did-my-own-research-for-20-minutes-but-got-distracted-because-i-can’t-focus-for-more-than-3-paragraphs-but-i-sound-confident guy?” absofuckinlutely.

Iulioh

9 months ago

-----

does this mean you trust complete randoms just as much?

-----

Personally i trust the consensus, not necessarily one random person.

I have the same problem as the guy above, at this point i assume doctors are almost always wrong if the problem isn't something really common for specific enough

Havoc

9 months ago

I’d imagine they also narrow the range of knowledge and discourse in general.

A bit like if you ask an LLM to tell you a joke they all tend to go with the same one

scotty79

9 months ago

Don't they just reduce the Q part of Q&A? And since the Q was A-d by AI doesn't that mean that A was there already and people just couldn't find it but AI did?

lordgrenville

9 months ago

The answer by humans is a) publicly accessible b) hallucination-free (although it still may not be correct) c) subject to a voting process which gives a good signal of how much we should trust it.

Which makes me think, maybe a good move for Stack Overflow (which does not allow the submission of LLM-generated answers, wisely imo) would be to add an AI agent that would suggest an answer for each question, that people could vote on. That way you can elicit human and machine answers, and still have the verification process.

david-gpu

9 months ago

As a user, why would I care whether an answer is "incorrect" or "hallucinated"? Neither one is going to solve the problem I have at hand. It sounds like a distinction without a difference.

lordgrenville

9 months ago

One relevant difference is that a better-quality human answer is correlated with certain "tells": correct formatting and grammar, longer answers, higher reputation. An incorrect LLM answer looks (from the outside) exactly the same as a correct answer.

alex7734

9 months ago

- AI answers are much easier and faster to produce thus it's going to produce a lot more wrong answers by sheer volume.

- AI answers are grammatically correct and verbose so it looks like the poster put effort into it which deceives people into thinking the answers is more trustworthy than it is.

Barring trolls, humans (for the most part) only answer if they think they're right, and the more effort put into the answer the more likely they don't get it wrong.

mikepurvis

9 months ago

Obviously there are exceptions but human-wrong answers tend to be more subtly wrong whereas hallucinated answers are just baffling and nonsensical.

Davidzheng

9 months ago

I don't think human mistakes are distinguishable from hallucinations.

Y_Y

9 months ago

Let's train a discriminator and see!

scotty79

9 months ago

It's a good idea but probably not easy to implement. SO answers are usually quite neat, like an email. Solving a problem with ChatGPT is more like ... chat. It's hard to turn it into something googlable and Google is how SO gets most of its traffic and utility.

intended

9 months ago

Why a vote? Voting != Verification.

TeMPOraL

9 months ago

LLMs are much better experience on the "Q side". Sure, there's the occasional hallucination here and there, but QnA sites are not all StackOverflow. Most of them are just content farms for SEO and advertising purposes - meaning, the veracity of the content doesn't matter, as long as it's driving clicks. At this moment, this makes LLMs much more trustworthy.

p0w3n3d

9 months ago

That's what I've been predicting and scared of: LLMs learn from online Q&A platforms, but people already stop posting questions and receiving answers. The sole knowledge sources will get poisoned with inaccurate LLM generated data, and therefore the entropy available to LLMs will become damped by the LLMs itselves (in a negative feedback loop)

bloomingkales

9 months ago

Guess we need an Agent that logs and re-contributes to Stackoverflow (for example) automatically.

Then also have agents that automatically give upvotes for used solutions. Weird world.

I’m just imagining the precogs talking to each other in Minority Report if that makes sense.

gigatexal

9 months ago

Because toxic but well meaning mods at stack overflow made us not want to use them anymore.

amelius

9 months ago

I tried several times to become active on SO, but always ran into the chicken-and-egg problem of not having enough points to do it. Very frustrating.

vitiral

9 months ago

We need to refine our tech stack to create a new one which is understandable by humans, before LLMs pollute our current stack to the point it's impossible to understand or modify. That's what I'm doing at https://lua.civboot.org

optimiz3

9 months ago

If a site aims to commoditize shared expertise, royalties should be paid. Why would anyone willingly reduce their earning power, let alone hand away the right for someone else to profit from selling their knowledge, unattributed no less.

Best bet is to book publish, and require a license from anyone that wants to train on it.

afh1

9 months ago

Why open source anything, let alone with permissive licensing, right?

immibis

9 months ago

This is a real problem with permissive licensing. Large corporations effectively brainwashed large swaths of developers into working for free. Not working for the commons for free, as in AGPL, but working for corporations for free.

optimiz3

9 months ago

To a degree, yes. I only open source work where I expect reciprocal value from other contributions.

johannes1234321

9 months ago

There is a lot of indirect hardly measurable value one can gain.

Going back to the original source: By giving an answer to somebody on a Q&A site, they might be a kid learning and then building solutions I benefit from later, again. Similar with software.

And I also consider the total gain of knowledge for our society at large a gain.

While my marginal cost form many things is low. And often lower than a cost-benefit calculation.

And some Q&A questions strike a nerve and are interesting to me to answer (be it in thinking about the problem or in trying to boiling it down to a good answer), similar to open source. Some programming tasks as fun problems to solve, that's a gain, and then sharing the result cost me nothing.

benfortuna

9 months ago

I think that is antithetical to the idea of Open Source. If you expect contributions then pay a bounty, don't pretend.

optimiz3

9 months ago

The bounty is you getting to use my work (shared in good faith no less). Appreciate the charity and don't be a freeloader or you'll get less in the future.

andrepd

9 months ago

GPL is antithetical to open source? Odd take

verdverm

9 months ago

There is a permissionless (MIT) vs permissioned (GPL) difference that is at the heart of the debate of what society thinks open source should mean

user

9 months ago

[deleted]

Y_Y

9 months ago

jncfhnb

9 months ago

Because it’s a marginal effect on your earning power and it’s a nice thing to do.

optimiz3

9 months ago

The management of these walled gardens will keep saying that to your face as they sell your contributions. Meanwhile your family gets nothing.

jncfhnb

9 months ago

Did your family get anything from you sharing this opinion? If not, why did you share it? Are you suggesting that your personal motivations for posting this cynicism are reasonable but that similar motivations that are altruistic for helping someone are not?

optimiz3

9 months ago

Sharing this opinion doesn't sacrifice my primary economic utility, and in fact disseminates a sentiment that if more widespread would empower everyone to realize more of the value they offer. Please do train an LLM to inform people to seek licensing arrangements for the expertise they provide.

jncfhnb

9 months ago

That’s just dumb, man. You’re not sacrificing anything by giving someone a helpful answer.

8note

9 months ago

Giving it away for free, you are ensuring there isn't a consulting gig that charges for giving helpful answers.

sfink

9 months ago

Putting your trash in a trash can is unethical. Leaving it on the street provides employment to trash collectors who need a job.

Obeying the law is unethical. Think of the poor laid off police officers.

The fact that information can be duplicated with zero cost is a feature, not a bug.

AlexandrB

9 months ago

"It's a nice thing to do" never seems to sway online platforms to treat their users better. This kind of asymmetry seems to only ever go one way.

falcor84

9 months ago

As a mid-core SO user (4 digit reputation), I never felt like I needed them to treat me better. I always feel that while I'm contributing a bit, I get so much more value out of SO than what I've put in, and am grateful for it being there. It might also have something to do with me being old enough to remember the original expertsexchange, as well as those MSDN support documentation CDs. I'm much happier now.

immibis

9 months ago

Stack Overflow won't even let me delete my own content now that they're violating the license to it.

malicka

9 months ago

While there is a thing to be said about the unethical business practices of Quora/StackOverflow, I reject the framing of “reducing your earning power.” Not everything is about transactions or self-benefit, especially when it comes to knowledge; it’s about contributing and collaboration. There is immense intrinsic value to that. I’m glad we don’t live in your world, where libre software is a pipe-dream and hackers hoard their knowledge like sickly dragons.

wwweston

9 months ago

When the jobs side of SO was active, it effectively did this. Strong answers and scoring were compensated with prospective employer attention. For a few years, this was actually where the majority of my new job leads came from. It was a pretty rewarding ecosystem, though not without its problems.

Not sure why they shut down jobs; they recently brought back a poorer version of it.

simonw

9 months ago

... you just shared your expertise here on Hacker News in the form of this comment without any expectation of royalties. How is posting on StackOverflow different?

krtalc

9 months ago

One could answer that question to people whose salary does not depend upon not understanding the answer.

kajaktum

9 months ago

I have no idea where to ask questions nowadays. Stackoverflow is way "too slow" (Go to website, write a nice well formatted thread, wait for answers). But there's way faster solutions now, namely from message groups.

For example, I was wondering if its okay to move my home directory to a different filesystem altogether and create a symlink from /home/. Where do I ask such questions? The freaking ZFS mailing list? SO? It was just a passerby question, and what I wanted more than the answer is the sense of community.

The only place that I know that have a wide enough range of interest, with many people that each know some of these stuff quite deep, is public, is easily accessible is unironically 4chan /g/.

I would rather go there then Discord where humanity's knowledge will be piped to /dev/null.

nunez

9 months ago

Reddit was a place until the API changes were made. Discord is another at the cost of public discoverability. Barring that, man pages and groking the sources.

CoastalCoder

9 months ago

I guess I'm out of the loop. What does "/g/" mean?

aezart

9 months ago

It's the technology message board on 4chan, each board has a name like that. /a/ for anime, /v/ for video games, etc.

rq1

9 months ago

People should just share their conversations with the LLMs online no?

This would blogging 5.0. Or web 7.0.

SunlitCat

9 months ago

Well, I just asked ChatGPT to answer my "How to print hello world in c++" with a typical stack overflow answer.

Lo and behold, the answer is very polite, explanative and even lists common mistakes. It even added two very helpful user comments!

I asked it again how this answer would look in 2024 and it just updated the answer to the latest c++ standard!

Then! I asked it what a moderator would say when they chime in. Of course the moderator reminded everyone to stay on focus regarding the question, avoid opinions and back their answer by documentation or standards. In the end the mod thanked for everyone's contribution and keeping the discussion constructive!

Ah! What a wonderful world ChatGPT is living at! I want to be there too!

qntmfred

9 months ago

That's pretty much what my youtube channel is turning into. just me talking to myself with chatgpt as co-host

eg https://www.youtube.com/watch?v=kB59Bz-F04E

thebigspacefuck

9 months ago

I’ve recently gone back to search/stack overflow and it’s amazing. I search a few words and find exactly what I’m looking for in milliseconds. Relevant documentation comes up as well. I scan the results and know immediately if it’s what I need.

With ChatGPT I’m waiting seconds or even 10+ seconds for a response, it can be 100% wrong and I then have to readjust my question or point out how it’s wrong which takes me a while.

joshdavham

9 months ago

With that being said, I imagine the quality of the questions have also improved quite a bit. I definitely don’t condone the rude behaviour on SO, but I also understand that the site used to be bombarded constantly with low quality questions that now thankfully LLMs can handle.

msephton

9 months ago

If the Q & A is already out there if argue we don't need it asking and, if we're lucky, answering again.

If the Q & A isn't our there, then it'll be asked on an online platform.

mrcino

9 months ago

By Public knowledge sharing, do they mean bazillions of StackOverflow duplicates?

knotimpressed

9 months ago

The article mentions that all kinds of posts were reduced, not just duplicates or even simple questions.

user

9 months ago

[deleted]

Abecid

9 months ago

I think this is just the future though. Why ask other people if LLMs can just retrieve, read, and train on official documentations

jetsetk

9 months ago

Official documentations are not always complete. It depends on the diligence of who wrote them and how good they are at writing. Customers and users will always send mails or open tickets to ask this and that about the docs afterwards. Can't rely on just learning or retrieving from the docs. Clarifications by some dev or someone who found a solution/workaround will always be required.

dogleash

9 months ago

If people are withdrawing from writing secondary documentation, why wouldn't they also withdraw from writing official documentation?

And don't say they wouldn't, we've already watched business decisions be made to pull back on spending to produce official documentation as the availability of the internet grew to the point vendors understood that everyone has always-online internet connections.

immibis

9 months ago

So do the new corporate policies of those platforms.

delduca

9 months ago

Marked as duplicated.

genr8

9 months ago

we are in the machine's feedback loop.

torginus

9 months ago

Honestly, online QA platforms do a fine job by themselves. Just today, I've found out that Quora started locking its high-quality answers made by actual experts behind paywalls. Get bent.

Artgor

9 months ago

Well, if the users ask frequent/common questions to ChatGPT and get acceptable answers, is this even a problem? If the volume of duplicate questions decreases, there should be no bad influence on the training data, right?

jeremyjh

9 months ago

They spoke to this point in the abstract. They observe a similar drop in less common and more advanced questions.

erie

9 months ago

I have queried gemini about a specific issue and when it failed to give a decent answer I narrowed down the issue with hints about the answer, it responded with details from an old post by me on reddit.

user

9 months ago

[deleted]

melenaboija

9 months ago

It’s been a relief to find a platform where I can ask questions without the fear of being humiliated

Half joking, but I am pretty tired of SO pedantry.

PhilipRoman

9 months ago

I haven't really found stackoverflow to be that humiliating (compared to some IRC rooms or forums), basic questions get asked and answered all the time. But the worst part is when you want to do something off the beaten path.

Q: how do I do thing X in C?

A: Why do you need to know this? The C standard doesn't say anything about X. The answer will depend on your compiler and platform. Are you sure you want to do X instead Y? What version of Ubuntu are you running?

haolez

9 months ago

The first time that I asked a question on #cpp @Freenode was a unique experience for my younger self.

My message contained greetings and the question in the same message. I was banned immediately and the response from the mods was:

- do not greet; we don't have time for that bullshit

- do not use natural language questions; submit a test case and we will understand what you mean through your code

- do not abbreviate words (you have abbreviated "you" as "u"); if you do not have time to type the words, we do not have time to read them

The ban lasted for a week! :D

jancsika

9 months ago

This being HN, I'd love to hear from one of the many IRC channel mods who literally typed (I'd guess copy/pasted) this kind of text into their chat room topics and auto-responders.

If you're out there-- how does it feel to know that what you meant as a efficient course-correction for newcomers was instead a social shaming that cut so deep that the message you wrote is still burned verbatim into their memory after all these years?

To be clear, I'm taking OP's experience as a common case of IRC newbies at that time on many channels. I certainly experienced something like it (though I can't remember the exact text), and I've read many others post on HN about the same behavior from the IRC days.

Edit: clarifications

HPsquared

9 months ago

I think a lot of unpaid online discussion forum moderation volunteers get psychic profit from power tripping.

hinkley

9 months ago

Give a man a little power.

a57721

9 months ago

> social shaming that cut so deep that the message you wrote is still burned verbatim into their memory after all these years

Oh my, this reminded me how some 20 years ago I was a high school kid and dared to install a more nerdy Linux distro (which I won't name here) on my home computer. After some big upgrade, the system stopped booting, and when in panic I asked for help at the official forum, I got responses that were shaming me for blindly copying commands from their official website without consulting some README files. That's how I switched to Debian and never looked back.

zahlman

9 months ago

>If you're out there-- how does it feel to know that what you meant as a efficient course-correction for newcomers was instead a social shaming that cut so deep that the message you wrote is still burned verbatim into their memory after all these years?

I've never been a IRC mod for a channel of anywhere near that significance. But as someone who has put in considerable effort for the last few years trying to explain basic fundamental ideas about what Stack Overflow is and how it's supposed to work to people (including to people whose accounts are 15+ years old but who insist on continuing to treat the Q&A like a discussion forum)...

... I'm kinda envious.

Being presented with a text entry box, and the implied contract that what you type there will be broadcast to a wide audience, is not supposed to imply a right to reject the community's telos and substitute your own. Gatekeeping of this sort is important; otherwise you end up with the Wikipedia page for "Dog" being flooded with people trying to get free veterinary consultation. Communities are allowed to have goals and purposes that aren't obvious and which don't match the apparent design, and certainly they aren't required to have goals which encompass everything possible with the sites software.

There are countless places on the Internet that work like a discussion forum where you can post a question, or a request for help, or just a general state of confusion, about a problem where you're just completely lost; and where you can expect to have a back-and-forth multi-way conversation with others to try and diagnose things or come to a state of understanding; and where nobody cares if your real concern ever surfaces in the form of an explicit, coherent question; and where there's no expectation that any of this dialogue should ever be useful to anyone else.

Stack Overflow is not that, explicitly and by design; and that came about specifically so that people who do have some clue and want to find an answer to an actual question, can do so without having to pick through a dialogue of the sort described above, following an arbitrarily long chain of posts of questionable relevance (that might well end in "never mind, I fixed it" with no explanation).

But in order to become a repository of such questions, people presented with the question submission form... need to be restricted to asking such questions.

CogitoCogito

9 months ago

> was instead a social shaming that cut so deep that the message you wrote is still burned verbatim into their memory after all these years?

Maybe that was the point?

haolez

9 months ago

To be fair, after the ban expired, I started submitting the test cases as instructed and the community was very helpful under these constraints.

oniony

9 months ago

Are you saying it worked?

haolez

9 months ago

Well, the tone was unnecessarily rude, and the ban period was excessive, but after complying, they were very helpful.

bqmjjx0kac

9 months ago

> do not use natural language questions

That is really absurd! AFAIK, it is not possible to pose a question to a human in C++.

This level of dogmatism and ignorance of human communication reminds me of a TL I worked with once who believed that their project's C codebase was "self-documenting". They would categorically reject PRs that contained comments, even "why" comments that were legitimately informative. It was a very frustrating experience, but at least I have some anecdotes now that are funny in retrospect.

ravenstine

9 months ago

Self-documenting code is one of the worst ideas in programming. Like you, I've had to work with teams where my PRs would be blocked until I removed my comments. I'm not talking pointless comments like "# loop through the array" but JSdoc style comments describing why a function was needed.

I will no longer work anywhere that has this kind of culture.

seattle_spring

9 months ago

Hard to agree or disagree without real examples. I've worked with people who insist on writing paragraphs of stories as comments on top of some pretty obviously self-descriptive code. In those cases, the comments were indeed just clutter that would likely soon be out of date anyway. Conversely, places that need huge comments like that usually should just be refactored anyway. It's pretty rare to actually need written comments to explain what's going on when the code is written semantically and thoughtfully.

bqmjjx0kac

9 months ago

To be clear, I'm not talking about self-indulgent essays embedded in comments. I'm talking about comments that convey context for humans.

Here's a made-up example:

    // Parses a Foo message [spec]. Returns true iff the parse was successful.
    // The `out` and `in` pointers must be non-null.
    //
    // [spec]: https://foo.example/spec
    bool parse_foo(foo_t* out, const char* in, size_t in_len) {
      assert(out);
      assert(in);

      // TODO(tracker.example/issue#123) Delete the Foo v1 parser. Nobody will
      // be sending us Foo v1 message when all the clients are upgraded to Bar
      // v7.1.
      //
      // Starting in Foo v2, messages begin with a two-byte version identifier.
      // Prior to v2, there was no version tag at all, so we have to make an
      // educated guess.
      //
      // For more context on the Foo project's decision to add a version tag:
      // https://foo.example/specv2#breaking-change-version-tag
      uint16_t tag_or_len;
      if (!consume_u16(&tag_or_len, &in, &in_len)) {
        return false;
      }
      if (tag_or_len == in_len) {
        return parse_foo_v1(out, in, in_len);
      }
      return parse_foo_v2(out, in, in_len);
    }

On that old project, I believe the TL would have rejected each of these comments on the grounds that the code is self-documenting. I find this to be absurd:

(1) Function comments are necessary to define a contract with the caller. Without it, callers are just guessing at proper use, which is particularly dangerous in C. Imagine if a caller guessed that the parser would gracefully degrade into a validator when `out` is NULL. (This is a mild example! I'm sure I could come up with an example where the consequence is UB or `rm -rf /`.)

(2) The TODO comment links to the issue tracker. There's no universe in which a reader could have found issue#123 purely by reading the code. On the aforementioned project, we were constantly rediscovering issues after wasting hours/days retreading old territory.

(3) Regarding the "Starting in Foo v2" comment... OK, fine, maybe this could have been inferred from reading the code. But it eases the reader into what's about to happen while providing further context with a link. On balance, I think this kind of "what" comment is worth including even though it mirrors the code.

wccrawford

9 months ago

A one week ban on the first message is clearly gatekeeping. What a bunch of jerks. A 1 hour ban would have been a lot more appropriate, and escalate from there if the person can't follow the rules.

Don't even get me started about how dumb rule 2 is, though. And rule 3 doesn't even work for normal English as many things are abbreviated, e.g. this example.

And of course, you didn't greet and wait, you just put a pleasantry in the same message. Jeez.

I'm 100% sure I'd never have gone back after that rude ban.

GeoAtreides

9 months ago

"I'm 100% sure I'd never have gone back after that rude ban."

upon saying this, the young apprentice was enlightened

luckylion

9 months ago

> And of course, you didn't greet and wait, you just put a pleasantry in the same message. Jeez.

I'm pretty sure that "rule" was more aimed towards "just ask your question" rather than "greet, make smalltalk, then ask your question".

I have similar rules, though I don't communicate them as aggressively, and don't ban people for breaking them, I just don't reply to greetings coming from people I know aren't looking to talk to me to ask me how I've been. It's a lot easier if you send the question you have instead of sending "Hi, how are you?" and then wait for 3 minutes to type out your question.

wccrawford

9 months ago

He did send the question. In the same message. It's just that he was polite about it. He didn't send a greeting and wait for a response before continuing.

chrsig

9 months ago

> I just don't reply to greetings coming from people I know aren't looking to talk to me to ask me how I've been.

Yep, it's in there with "please use the public channel rather than DMs".

Some nettiquite, please.

Of course, in the process of typing out their question, they usually find the answer on their own.

SunlitCat

9 months ago

That contradiction is funny, tho:

> - do not greet; we don't have time for that bullshit

and

> do not abbreviate words (you have abbreviated "you" as "u"); if you do not have time to type the words, we do not have time to read them

So they have apparently enough time to read full words, it seems!

sokoloff

9 months ago

I think reading “u” takes longer than reading “you”.

With “u”, I have to pause for a moment and think “that’s not a normal word; I wonder if they meant to type ‘i’ instead (and just hit a little left of target)?” and then maybe read the passage twice to see which is more likely.

I don’t think it’s quite as much a contradiction. (It still could be more gruff than needed.)

kfajdsl

9 months ago

probably a generational thing, I process that and other common texting/internet abbreviations exactly like normal words.

beeboobaa3

9 months ago

Yh u gtta b c00l w abbrvs

6510

9 months ago

The noobs don't got how we get where we get?

edit: I remember how some communities changed into: The help isn't good enough, you should help harder, I want you to help me by these conventions. Then they leave after getting their answer and no one has seen them ever again rather than join the help desk.

tinco

9 months ago

I learned a bunch of programming languages on IRC, and the C and C++ communities on freenode were by far the most toxic I've encountered.

Now that Rust is succesfully assimilating those communities, I have noticed the same toxicity on less well moderated forums, like the subreddit. The Discord luckily is still great.

It's probably really important to separate the curmudgeons from the fresh initiates to provide an enjoyable and positive experience for both groups. Discord makes that really easy.

In the Ruby IRC channel curmudgeons would simply be shot down instantly with MINASWAN style arguments. In the Haskell IRC channel I guess it was basically accepted that everyone was learning new things all the time, and there was always someone willing to teach at the level you were trying to learn.

betaby

9 months ago

Not my experience. IRC was 'toxic' since forever, but that't not a toxicity, that's inability to read emotion through transactional plan text. Once one account that in the mental model IRC is just fine.

bayindirh

9 months ago

Yes, immature people are everywhere, but SO took it to a new level before they had to implement a code of conduct. I remember asking questions and getting "this is is common misconception, maybe you're looking for X instead" type of actually helpful and kind answers.

After some point it came to a point that if you're not asking a complete problem which can't be modeled as a logic statement, you're labeled as stupid for not knowing better. The thing is, if I knew better or already found the answer, I'd not be asking to SO in the first place.

After a couple of incidents, I left the place for the better. I can do my own research, and share my knowledge elsewhere.

Now they're training their and others models with that corpus, I'll never add a single dot to their dataset.

zahlman

9 months ago

Stack Overflow always had a code of conduct - it just wasn't always called that, and it started out based in some naive assumptions about human behaviour.

Someone who tells you something along the lines of "this is is common misconception, maybe you're looking for X instead" is being helpful and kind, and is not at all "labeling you as stupid". On Stack Overflow, it's not at all required that you actually need the question answered, as asked, in order to ask it. (You're even actively encouraged to ask and answer your own questions, as long as both question and answer meet the usual standards, as a way to share your expertise.)

It has never been acceptable to insult others openly on the site - but moderation resources are, and always have been, extremely strained, and the community simply doesn't see certain things as insulting or "mean" that others might. In particular, downvotes aren't withheld out of sympathy, because they are understood to be purely for content rating - but many users take them personally anyway. Curators commonly get yelled at when they comment (thus identifying themselves) to explain policy in polite but blunt copy-paste terms; and they get seethed at when they don't comment. There is a standard set of reasons to close questions (https://meta.stackoverflow.com/questions/417476), and a how-to-ask guide (https://stackoverflow.com/help/how-to-ask), and a site tour (https://stackoverflow.com/tour), and an entire Meta site with a variety of FAQ entries and other common references (for example, I wrote the "proposed faq" of https://meta.stackoverflow.com/questions/429808); but very few people seem interested in checking the extant documentation for the community norms, and then feel slighted when the community norms aren't what they assume they ought to be (because it works that way everywhere else). Communities should be allowed to have their own norms, so that they can pursue their own goals (https://meta.stackoverflow.com/questions/254770).

A ton of users confuse "know better or find the answer ahead of time" for debugging. Stack Overflow allows for what are commonly called "debugging questions", but this does not mean questions wherein you ask someone to debug the code for you. Why? Because that can't ever be helpful to someone else - nobody else has your code, and thus they can't benefit from someone locating the bug in your code. The "debugging questions" that Stack Overflow does want are questions about behaviour that isn't understood, after you have isolated the misbehaving part. These are ideally presented in the form of a "minimal reproducible example" or MRE (https://stackoverflow.com/help/minimal-reproducible-example).

Stack Overflow explicitly expects you to do research before asking your question (https://meta.stackoverflow.com/questions/261592) - because most of the goal of the site is to cover questions that can't be answered that way. Traditional reference documentation is only part of the puzzle (https://diataxis.fr/); the Q&A site format allows for explanations (for "debugging questions") and how-to guides (in response to simple queries about how to do some atomic, well-specified task). (Tutorials don't fit in this format because they don't start with a clear question from the student.)

d0mine

9 months ago

The major misunderstanding is that SO exists to help the question author first. It is not an IRC. The most value comes from googling a topic and getting existing answers on SO.

In other words, perhaps in your very specific case, your question is not XY problem but for the vast majority of visitors from google it won't be so. https://en.wikipedia.org/wiki/XY_problem

Personally, I always answered SO from at least two perspectives: how the question looks for someone coming from google and how the author might interpret it.

elicksaur

9 months ago

On the other hand, I find it to be a fatal flaw that LLMs can’t say, “Hey you probably don’t actually want to do it that way.”

rytis

9 months ago

I think it depends on how the question is constructed: - I want to do X, how do I do it? - I was thinking of doing X to achieve Y, wonder if that's a good idea?

Sometimes, I really want to do X, I know it may be questionable, I know the safest is "probably don't want to do it", and yet, that's not someone else's (or LLMs) business, I know exactly what I want to do, and I'm asking if anyone knows HOW, not IF.

So IMO it's not a flaw, it's a very useful feature, and I really do hope LLMs stay that way.

og_kalu

9 months ago

I think there can be a middle ground. I think it's fine if LLMs warn you but still answer the question the way you asked. I don't always know when i should be asking if something is a good idea or not.

stemlord

9 months ago

Sure but most common LLMs aren't going to be patronizing and presumptuous while they say so

Ekaros

9 months ago

I always wonder about that. Very often it seems that you need to be able to LLM that they are wrong. And then they happily correct themselves. But if you do not know that the answer is wrong how can you get correct answer?

o11c

9 months ago

Worse: if you think the LLM is wrong and try to correct it, it will happily invent something completely different (and actually wrong this time).

milesvp

9 months ago

This happened to me the other day. I had framed a question in the ordinal case, and since I was trying to offload thinking anyways, I forgot that my use case was rotated, and failed to apply the rotation when testing the LLM answer. I corrected it twice before it wrapped around to the same (correct) previous answer, and that’s when I noticed my error. I apologized, added the rotation piece to my question, and it happily gave me a verifiably correct answer.

hanniabu

9 months ago

My questions always get closed and marked as a duplicate with a comment linking to a question that's unrelated

zahlman

9 months ago

A large majority of the complaints we get of this form on the Meta site, whenever there's an actual object case, turn out to be really clear-cut duplicates where the OP is quibbling on an irrelevant detail instead of trying to understand the explanation in the answer and adapt the code to individual circumstances.

hanniabu

9 months ago

No, that's not the case and this type of attitude is exactly what I mean

zahlman

9 months ago

Feel free to show supporting examples, then. Simple contradiction is not very convincing.

mhh__

9 months ago

I find that this is mainly a problem in languages that attract "practical"/"best tool for the job" Philistines. Not going to name names right now but I had never really experienced this until I started using languages from a certain Washington based software company.

appendix-rock

9 months ago

God. Yeah. I’ve always hated #IAMPolicies on Freenode :)

chii

9 months ago

> Q: how do I do thing X in C?

SO does suck, but i've found that if you clarify in the question what you want, and pre-empt the Y instead of X type answers, you will get some results.

PhilipRoman

9 months ago

I wish... Some commenters follow up with "Why do you think Y won't work for you?"

chii

9 months ago

i haven't been using SO recently, so this problem may have devolved now. if people remaining today on SO are such that they just question the premise whenever something slightly off the beaten track is asked, then i guess the site has really died.

skywhopper

9 months ago

I mean, those all sound like good questions. You might be a super genius, but most people who ask how to do X actually want to do Y. And if they DO want X, then those other questions about compiler and OS version really matter. The fact that you didn’t include them in your question shows you aren’t really respecting the time of the experts on the platform. If you know you are doing something unusual, then you need to provide a lot more context.

ElFitz

9 months ago

It’s also quite fun when you ask niche questions that haven’t been asked or answered yet ("How do I do X with Y?"), and just get downvoted for some reason.

That’s when I stopped investing any effort into that community.

Turned out it, counter-intuitively, was impossible. And not documented anywhere.

LeadB

9 months ago

For the major programming languages, it must be a pretty esoteric question if it does not have an answer yet.

Increasingly, the free products of experts are stolen from them with the pretext that "users need to be protected". Entire open source projects are stolen by corporations and the experts are removed using the CoC wedge.

Now SO answers are stolen because the experts are not trained like hotel receptionists (while being short of time and unpaid).

I'm sure that the corporations who steal are very polite and CoC compliant, and when they fire all developers once an AGI is developed, the firing notices will be in business speak, polite, express regret and wish you all the best in your future endeavors!

stemlord

9 months ago

Hm fair point. Rudeness is actually a sign of humanity. Like that one black mirror episode

grugagag

9 months ago

Fair but for as long rudness is not the dominant mode.

appendix-rock

9 months ago

I’m sorry that you ran afoul of a CoC or whatever, but this sounds like a real ‘airing dirty laundry’ tangent.

lrpanb

9 months ago

One man's tangent is another man's big picture. It may be the case of course that some people guilty of CoC overreach are shaking in their boots right now because they went further than their corporations wanted them to go.

tlogan

9 months ago

The main issue with Stack Overflow (and similar public Q&A platforms) is that many contributors do not know what they do not know, leading to inaccurate answers.

Additionally, these platforms tend to attract a fair amount of spam (self promotion etc) which can make it very hard to find high-quality responses.

cjauvin

9 months ago

I find that LLMs are precisely that: marvelous engines to explore "what you don't know that you don't know", about anything.

milesvp

9 months ago

I’m not sure how to take you comment, but I feel the same(?) way. I love that I can use LLMs to explore topics that I don’t know well enough to find the right language to get hits on. I used to be able to do this with google, after a few queries, and skimming to page 5 hits, I’d eventually find the one phrase that cracks open the topic. I haven’t been able to do that with google for at least 10 years. I do it regularly with LLMs today.

amenhotep

9 months ago

They are extraordinarily useful for this! "Blah blah blah high level naive description of what I want to know about, what is the term of art for this?"

Then equipped with the right term it's way easier to find reliable information about what you need.

internet101010

9 months ago

Medium is even worse about this. It's more self-promotion than it is common help.

bee_rider

9 months ago

QA platforms and blogging platforms both seem to have finite lifespans. QA forums (Stack Overflow, Quora, Yahoo answers) do seem to last longer, need to be moderated pretty aggressively unless they are going to turn into homework help platforms.

Blogging platforms are the worst though. Medium looked pretty OK when it first came out. But now it is just a platform for self-promotion. Substack is like 75% of the way through that transition IMO.

People who do interesting things spend most of their time doing the thing. So, non-practicing bloggers and other influencers will naturally overwhelm the people who actually have anything interesting to report.

n_ary

9 months ago

Begin rant.

I don’t want to be that guy saying this, but 99% of the top results on google from Medium related to anything technical is literally the reworded/reframed version of the official quick start guide.

There are some very rare gems, but it is hard to find those among the above mentioned ocean of reworded quick starts disguised as “how to X”, “fixing Y”. Almost reminds me of the SEO junks when you search “how to restart iPhone” and find answers that dance around letting it die from battery drain and then charge, install this software, take it to the apple repair shop, go to settings and traverse many steps while not saying that if you are between these models use the power+volume up button trick.

End of rant.

bee_rider

9 months ago

Somebody who just summarizes tutorials can write like 10 medium posts, in the time it takes an actual practitioner to do something legitimately interesting.

n_ary

9 months ago

Well said. Most great articles I found on Medium are actually very old hence do not rank well.

TacticalCoder

9 months ago

What you mention has been serious from day one indeed.

But to me the worst issue is it's now "Dead Overflow": most answers are completely, totally and utterly outdated. And seen that they made the mistake of having the concept of an "accepted answer" (which should never have existed), it only makes the issue worse.

If it's a question about things that don't change often, like algorithms, then it's OK. But for anything "tech", technical rot is a very real thing.

To me SO has both outdated and inaccurate answers.

zahlman

9 months ago

I agree that "accepted answer" is a misfeature; but the default is for questions to remain open to new answers indefinitely.

On the one hand, good new answers can remain buried under mediocre and outdated answers more or less indefinitely, because they received hundreds of questionable upvotes in the past. (Despite popular perception, the average culture of Stack Overflow is very heavily weighted towards upvoting. I have statistical analysis of this on the meta site somewhere.)

On the other hand, there are tons of popular reference questions that had to be "protected" so that people couldn't just sign up and contribute the 100th answer (yes, the count really does range into triple digits in several cases, especially if you have access to deleted answers) to a question where there are maybe five actually reasonable-considered-distinct answers possible. And this protection is a quite low barrier - 10 reputation, IIRC. (I'm not sure what motivates people to write these new answers. It might have something to do with being able to point to a Stack Overflow profile which boasts that you've "reached" millions of people, due to its laughably naive algorithm for that.)

mrkramer

9 months ago

>The main issue with Stack Overflow (and similar public Q&A platforms) is that many contributors do not know what they do not know, leading to inaccurate answers.

The best Q&A platform would be the one where experts and scientists answer questions but sites like Wikipedia and Reddit showed that broad range of audience can also be pretty good at providing useful information and moderating it.

delichon

9 months ago

I've gotten answers from OpenAI that were technically correct but quite horrible in the longer term. I've gotten the same kinds of answers on Stack Overflow, but there other people are eager to add the necessary feedback. I got the same feedback from an LLM but only because in that case I knew enough to ask for it.

Maybe we can get this multi-headed advantage back from LLMs by applying a team of divergent AIs to the same problem. I've had other occasions when OpenAI gave me crap that Claude corrected, and visa versa.

zmgsabst

9 months ago

You can usually even ask the same LLM:

- do a task

- criticize your job on that task

- redo that task based on criticism

I find giving the LLM a process greatly improves the results.

roughly

9 months ago

What’s fun is that you can skip step 1. The LLM will happily critique its own nonexistent output.

zmgsabst

9 months ago

So?

I too can write made up criticism if that’s what my boss wants in the workplace — but that doesn’t suddenly invalidate my ability to criticize my own work to improve it.

iSnow

9 months ago

That's a smart idea I didn't think of.

I've been arguing with Copilot back and forth where it gave me a half-working solution that seemed overly complicated but since I was new to the tech used, I couldn't say what exactly was wrong. After a couple of hours, I googled the background and trust my instinct and was able to simplify the code.

At that situation, where I iteratively improved the solution by telling Copilot things seem to complicated and this or that isn't working. That led the LLM to actually come back with better ideas. I kept asking myself why something like you propose isn't baked into the system.

drawnwren

9 months ago

The papers I've read have shown LLM critics to be quite bad at their work. If you give an LLM a few known good and bad results, I think you'll see the LLM is just as likely to make good results bad as it is to make bad results good.

blazing234

9 months ago

How do you know the second result is correct? Or the third? Or the fourth?

phil-martin

9 months ago

I approach it the same way as the things I build myself - testing and measuring.

Although if I’m truly honest with myself, even after many years of developing, the true cycle of me writing code is: over confidence, then shock it didn’t work 100% the first time, wondering if there is a bug in the compiler, and then reality setting in that of course the compiler is fine and I just made my 15th off-by-one error of the day :)

herval

9 months ago

The flipside to this is you can’t get answers to anything _recent_, since the models are trained years behind in content. My feelig is it’s getting increasingly difficult to figure out issues on the latest version of libraries & tools, as the only options are private Discords (which aren’t even googleable)

yieldcrv

9 months ago

The models come out fast enough

Doesn’t seem to be a great strategy to always need these things retrained, but OpenAI’s o1 has things from early 2024

Don’t ask about knowledge cutoffs anymore, that’s not how these things are trained these days. They don’t know their names or the date.

herval

9 months ago

Not my daily experience. It’s been impossible to get relevant answers to questions on multiple languages and frameworks, no matter the model. O1 frequently generates code using deprecated libraries (and is unable to fix it with iteration).

Not to mention there will be no data for the model to learn the new stuff anyway, since places like SO will get zero responses with the new stuff for the model to crawl

yieldcrv

9 months ago

Yes I encounter that too but for things in just the last few months with o1

It is really difficult if you need project flags and configurations to make things work, instead of just code

Github issues gets crawled, where many of these frameworks have their community

Vegenoid

9 months ago

I think that knowledge hoarding may come back with a vengeance with the threat people feel from LLMs and offshoring.

chairmansteve

9 months ago

Yep. For SO, the incentive was a high reputation. But now an LLM is stealing your work, what's the point?

teeray

9 months ago

I feel like this will be really beneficial in work environments. LLMs provide a lot of psychological safety when asking “dumb” questions that your coworkers might judge you for.

btbuildem

9 months ago

At the same time, if I coworker comes asking me for something _strange_, my first response is to gently inquire as to the direction of their efforts instead of helping them find an answer. Often enough, this ends up going back up their "call stack" to some goofy logic branch, which we then together undo, and everyone is pleased.

lynx23

9 months ago

Full ACK. It has been liberating to be able to chat about a topic I always wanted to catch up on. And, even though I read a lot of apologies, at least nobody is telling me "Thats not what you actually want."

jneagu

9 months ago

I am very curious to see how this is going to impact STEM education. Such a big part of an engineer's education happens informally by asking peers, teachers, and strangers questions. Different groups are more or less likely to do that consistently (e.g. https://journals.asm.org/doi/10.1128/jmbe.00100-21), and it can impact their progress. I've learned most from publicly asking "dumb" questions.

ocular-rockular

9 months ago

It won't. If you look at advanced engineering/mathematics material online it is abysmal in quality of actually "explaining" the content. Most of the learning and understanding of intricacies happens via dialogue with professors/mentors/colleagues/etc.

That said, when that is not available, LLMs do an excellent job or rubber ducky-ing complicated topics.

jneagu

9 months ago

To your latter point - that’s where I think most of the value of LLMs in education is. They can explain code beyond the educational content that’s already available out there. They are pretty decent at finding and explaining code errors. Someone who’s ramping up their coding skills can make a lot of progress with those two features alone.

ocular-rockular

9 months ago

Yeah... only downside is that it requires a level of competency to recognize when the LLM is shoveling shit instead of gold.

amarcheschi

9 months ago

I've found chatgpt quite helpful in understanding some things that I couldn't figure out when approaching pytorch for an internship

Aurornis

9 months ago

Many of the forums I enjoyed in the past have become heavily burdened by rules, processes, and expectations. They are frequented by people who spend hours every day reading everything and calling out any misstep.

Some of them are so overburdened that navigating all of the rules and expectations becomes a skill in itself. A single innocent misstep turns simple questions into lectures about how you’ve violated the rules.

One Slack I joined has created a Slackbot to enforce these rules. It became a game in itself for people to add new rules to the bot. Now it triggers on a large dictionary of problematic words such as “blind” (potentially offensive to people with vision impairments. Don’t bother discussing poker.). It gives a stern warning if anyone accidentally says “crazy” (offensive to those with mental health problems) or “you guys” (how dare you be so sexist).

They even created a rule that you have to make sure someone wants advice about a situation before offering it, because a group of people decided it was too presumptuous and potentially sexist (I don’t know how) for people to give advice when the other person may have only wanted to vent. This creates the weirdest situations where someone posts a question in channels named “Help and advice” and then lurkers wait to jump on anyone who offers advice if the question wasn’t explicitly phrased in a way that unequivocally requested advice.

It’s all so very tiresome to navigate. Some people appear to thrive in this environment where there are rules for everything. People who memorize and enforce all of the rules on others get to operate a tiny little power trip while opening an opportunity to lecture internet strangers all day.

It’s honestly refreshing to go from that to asking an LLM that you know isn’t going to turn your question into a lecture on social issues because you used a secretly problematic word or broke rule #73 on the ever growing list of community rules.

abraae

9 months ago

> Some people appear to thrive in this environment where there are rules for everything. People who memorize and enforce all of the rules on others get to operate a tiny little power trip while opening an opportunity to lecture internet strangers all day.

Toddlers go through this sometimes around ages 2 or 3. They discover the "rules" for the first time and delight in brandishing them.

Ferret7446

9 months ago

The reason those rules are created is because at some point something happened that necessitated that rule. (Not always of course, there are dictatorial mods.)

The fundamental problem is that communities/forums (in the general sense, e.g., market squares) don't scale, period. Because moderation and (transmission and error correction of) social mores don't scale.

Aurornis

9 months ago

> The reason those rules are created is because at some point something happened that necessitated that rule. (Not always of course, there are dictatorial mods.)

Maybe initially, but in the community I’m talking about rules are introduced to prevent situations that might offend someone. For example, the rule warning against using the word “blind” was introduced by someone who thought it was a good thing to do in case a person with vision issues maybe got offended by it at some point in the future.

It’s a small group of people introducing the rules. Introducing a new rule brings a lot of celebration for the person’s thoughtfulness and earns a lot of praise and thanks for making the community safer. It’s turned into a meta-game in itself, much like how I feel when I navigate Stack Overflow

zahlman

9 months ago

Stack Overflow's policies exist for clearly documented reasons that are directly motivated by a clearly documented purpose (which happens not to be aligned with what most new users want the site to be, but they aren't the ones who should get to decide what the site is). They are certainly not politically motivated or created for the sake of a power trip. They came out of years of discussion on the meta site, and have changed over time due to the community realizing that previous iterations missed the mark.

BlueTemplar

9 months ago

Just go to a different community for the same topic ?

Sounds like that one is on its way out anyway...

beeboobaa3

9 months ago

I'm sorry but the funny thing is, the only people I've ever seen complain about SO are people who don't know how to search.

wokwokwok

9 months ago

Everyone has a pet theory about what’s wrong with SO; but here’s the truth:

Whatever they’re doing, it isn’t working.

Blame mods. Blame AI. Blame askers… whatever man.

That is a sinking ship.

If you don’t see people complain about SO, it’s because they aren’t using it, not because they’re using the search.

Pretty hard to argue at this point that the problem is with the users being too shit to use the platform.

That’s some high level BS.

grogenaut

9 months ago

I get good answers all the time on SO or used to. My problem is that I've been down voted several times for "stupid question" and also been down voted for not knowing what I was talking about in an area I'm an expert in.

I had one question that was a bit odd and went against testing dogma that I had a friend post. He pulled it 30 minutes later as he was already down 30 votes. It was a thing that's not best practice in most cases but also in certain situations the only way to do it. Like when you're testing apis you don't control.

In some sections people also want textbook or better quality answers from random strangers on the internet.

The final part is that you at least used to have to build up a lot of karma to be able to post effectively or at all in some sections or be seen. Which is very catch 22.

So it can be both very useful and very sh*t.

fabian2k

9 months ago

-30 votes would be extremely unusual on SO. That amount of votes even including upvotes in such a short time would be almost impossible. The only way you get that kind of massive voting is either if the question hits the "Hot Network Questions" or if an external site like HN with a high population of SO users links to it and drives lots of traffic. Questions with a negative score won't hit the hot network questions, so it seems very unlikely to me that it could be voted on that much.

o11c

9 months ago

I don't think I've ever seen anything, no matter how bad, go below -5, and most don't go below -1. Once a question is downvoted:

- it's less likely that the question even gets shown

- it's less likely that people will even click on it

- it's less likely that people who think it's bad will bother to vote on it, since the votes are already doing the right thing

- if it's really bad, it will be marked for deletion before it gets that many downvotes anyway

SO has its problems but I don't even recognize half the things people complain about.

wizzwizz4

9 months ago

You can get +30 from the HNQ list, but -30 is much harder, because the association bonus only gives you 101 rep, and the threshold for downvoting is 125.

wwweston

9 months ago

I get useful info from SO all the time, so often that these days it’s rare I have to ask a question. When I do, the issue seems to be it’s likely niche enough that an answer could take days or weeks, which is too bad, but fair enough. It’s also rare I can add an answer these days but I’m glad when I can.

Ferret7446

9 months ago

I submit that what SO is doing is working; it's just that SO is not what some people want it to be.

SO is not a pure Q&A site. It is essentially a wiki where the contents are formatted as Q&As, and asking questions is merely a method to contribute toward this wiki. This is why, e.g., duplicates are aggressively culled.

zahlman

9 months ago

>is not a pure Q&A site. It is essentially a wiki where the contents are formatted as Q&As

The thing is, the meta community of Stack Overflow - and of other similar sites like Codidact - generally understand "Q&A site" to mean the exact thing you describe.

The thing where you "ask a question"[0] and start of a chain of responses which ideally leads to you sorting out your problem, is what we call a discussion forum. The Q&A format is about so much more than labelling one post as a "question" and everything else as an "answer" or a "comment" and then organizing the "answers" in a certain way on the page.

[0]: which doesn't necessarily have a question mark or a question word in it, apparently, and which rambles without a clear point of focus, and which might be more of a generic request for help - see https://meta.stackoverflow.com/questions/284236.

zahlman

9 months ago

>Whatever they’re doing, it isn’t working.

It's working just fine. The decline in the rate of new questions is generally seen as a good thing, as it's a sign of reaching a point where the low-hanging fruit has been properly picked and dealt with and new questions are only concerned with things that actually need to be updated because the surrounding world has changed (i.e., new versions of APIs and libraries).

>Pretty hard to argue at this point that the problem is with the users being too shit to use the platform.

On the contrary. Almost everyone who comes to the site seems to want to use it in a way that is fundamentally at odds with the site's intended purpose. The goal is to build a searchable repository of clear, focused, canonicalized questions - such that you can find them with a search engine, recognize that you've found the right question, understand the scope of the question, and directly see the best answers to the highest-quality version of that question. When people see a question submission form and treat it as they would the post submission form on a discussion forum, that actively pollutes said repository. It takes time away from subject-matter experts; it makes it harder for curators to find duplicates and identify the best versions thereof to canonicalize; and it increases the probability that the next person with the same problem, armed with a search engine, will find a dud.

barbecue_sauce

9 months ago

But what problem is there with it? Most of the important questions have been answered already.

wholinator2

9 months ago

We're talking about stackoverflow right? The website is a veritable gold mine of carefully answered queries. Sure, some people are shit, but how often are you unable to get at least some progress on a question from it? I find it useful in 90-95% of queries, i find the answers useful in 99% of queries that match my question. The thing is amazing! I Google search a problem, and there's 5 threads of people with comparable issues, even if no one has my exact error, the debugging and advice around the related errors is almost always enough to get me over the hump.

Why all the hate? AI answers can suck, definitely. Stackoverflow literally holds the modern industry up. Next time you have a technical problem or error you don't understand go ahead and avoid the easy answers given on the platform and see how much better the web is without it. I don't understand, what kind questions do you have?

mvdtnz

9 months ago

Nobody is criticising the content that is on the site. The problem is an incredibly hostile user base that will berate you if you don't ask your question in the exact right way, or if you ask a question that implies a violation of some kind of best practice (for which you don't provide context because it's irrelevant to the question).

As for the AI, it can only erode the quality of the content on SO.

Vegenoid

9 months ago

Yeah, Stackoverflow kinda dug their own grave by making their platform and community very unpleasant to engage with.

lynx23

9 months ago

Well, I believe the underlying problem of platforms like StackOverflow, ticketing systems (in-house and public) and even CRMs is not really solvable. The problem is, the quality of an answer is actually not easy to determine. All the mechanisms we have are hacks, and better solutions would need more resources... which leads to skewed incentives, and ultimately to a "knwoledge" db thats actually not very good. People are incentiviszed to collect karma points, or whatever it is. But these metrics are not really resembling the quality of their work... Crowdsourcing this mechanisms via upvotes or whatever does also not really work, because quantity is not quality... As said, I believe this is a problem we can not solve.

timhh

9 months ago

Stackoverflow mods and power users being arseholes reduces the use of Stackoverflow. ChatGPT is just the first viable alternative.

tomrod

9 months ago

While not exactly the same wording, this was my also first thought.

There have been two places that I remember where arrogance of the esoterati drive two feedback cycles:

1. People leave after seeking help for an issue they believed needed the input of masters.

2. Because of gruff treatment, the masters receive complaints and indignation, triggering a backfire effect feedback loop, often under the guise of said masters not wanting to be overwhelmed by common problems and issues.

There is a few practical things that can help with this (clear guides to point to, etc.), but the missing element is kindness / non-judgmental responsiveness.

croes

9 months ago

How can it be an alternative if it needs the data from Stackoverflow?

OgsyedIE

9 months ago

Because consumers in every market develop models of reality (and make purchasing decisions) on the basis of their best attempts to derive accuracy from their own inevitably flawed perceptions, instead of having perfect information about every aspect of the world?

manojlds

9 months ago

Easy to keep saying this, but SO was useful because it wasn't wild west.

weinzierl

9 months ago

It was useful and not the wild west as long as a very small group of intelligent and highly motivated individuals moderated it. First and foremost Jeff Atwood used to do a lot of moderation himself - not unlike dang on HN.

When that stopped, the site (and to some degree its growing number of sister sites) continued on its ballistic curve, slowly but continuously descending into the abyss.

My primary take away is that we have not found a way to scale moderation. SO was doomed anyway, LLMs have just sped up that process.

timhh

9 months ago

I disagree. It was useful because the UI was (and is!) great. Easy to use markdown input, lists of answers sorted by votes, very limited ads, etc. The gamification was also well done.

Compared to anything before it (endless phpBB forums, expertsexchange, etc.) it was just light years ahead.

Even today compared the SO UI with Quora. It's still 10x better.

waynecochran

9 months ago

I have definitely noticed a large drop in responses on SO. I am old enough to have seen the death of these platforms. First to go was usenet when AOL and its ilk became a thing and every channel turned into spam.

rkncland

9 months ago

ChatGPT plagiarizes the anwers of those whom you call "arseholes". How is using Stackoverflow in read-only mode different from using ChatGPT?

Except of course that reading Stackoverflow directly has better retention rates, better explanations and more in-depth discussions.

(My view is that moderators can be annoying but the issue is overblown.)

verdverm

9 months ago

Plagiarizing means violating copyright, loosely speaking. When you, as a human, use SO, you assign your rights to the content to SO. That company is licensing the content to 3rd parties, including those who want to train their LLMs.

What I find is that the LLMs are not spitting out SO text word for word, as one would when plagiarizing. Rather, the LLM uses the context and words of my question when answering, making the response specific and cohesive (by piecing together answers from across questions).

zahlman

9 months ago

>When you, as a human, use SO, you assign your rights to the content to SO.

No, you don't. You license it. The community gets a Creative Commons license (https://stackoverflow.com/help/licensing), and the company gets a few additional rights (https://stackoverflow.com/legal/terms-of-service/public#lice...). You retain copyright.

tomrod

9 months ago

I thought plagarizing was producing new work substantially copied from prior work, regardless who owns the copyright? I thought this because self-plagarizing exists.

verdverm

9 months ago

Well, if we could not reproduce with changes, what others have written and we have learned, it is unlikely we could make real progress. There are many more concepts, like fair use, meaningful changes, and other legalese; as well as how people use the term "plagiarize" differently. I never heard of this "self-plagarizing" concept, it seems like something fringe that would not be enforceable other than in the court of public opinion or the classroom via grades

tomrod

9 months ago

You're one of today's lucky 10,000! https://xkcd.com/1053/

It's a core issue in academia and other areas where the output is heavily the written word.

[0] https://en.wikipedia.org/wiki/Plagiarism#Self-plagiarism

[1] https://ori.hhs.gov/self-plagiarism

[2] https://www.aje.com/arc/self-plagiarism-how-to-define-it-and...

verdverm

9 months ago

Reproducing sections is useful in academic publishing. I saw it while reading 100s of papers during my PhD.

(1) If you are reading your entrypoint into an area of research, or a group, it is useful context on first encounter

(2) If you are not, then you can easily skip it

(3) Citing, instead of reproducing sections like background work, means you have to go look up other papers, meaning a paper can no longer stand on its own.

Self-plagiarism is an opinion among a subset of academics, not something widely discussed or debated. Are there bad apples, sure. Is there a systemic issue, I don't think so.

miohtama

9 months ago

It's an interesting question. The world has had 30 years to come up with a StackOverflow alternative with friendly mods. It hasn't. So the question is that has someone tried hard enough or can it be done it the first place.

I am Stack overflow mod, dealing with other mods. There is definitely unnecessary hostility there, but most of question closes and downvotes Go 90% to low quality questiond which lack proper professionalism to warrant anyone's time. It is remaining 10% that turns off people.

We can also take analogs from the death of Usenet.

jprete

9 months ago

I think the problem isn't specific to SO. Text-based communication with strangers lacks two crucial emotional filters. Before speaking, a person anticipates the listener's reaction and adjusts what they say accordingly. After speaking, they pay attention to the listener's reaction to update their understanding for the future.

Without seeing faces, people just don't do this very well.

shagie

9 months ago

The model of QA that Stack Overflow and its various forks follow the same approach struggle with the 90/9/1 problem ( https://en.wikipedia.org/wiki/1%25_rule ).

Q&A was designed to handle the social explosion problem and the eternal September problems by having a larger percent of the username take an interest in the community over time and continue to maintain that ideal. Things like comments and discussions being difficult is part of the design to make it so that you don't get protracted discussions that in turn needs more moderation resources.

The fraction of the people doing the curation and moderation of the site overall has dropped. The reasons for that drop are manyfold. I believe that much of it falls squarely upon Stack Overflow corporate without considering second order effects of engaging and managing the community of people who are interested in the success of the site as they envision.

Ultimately, Stack Overflow has become too successful and the people looking to it now have a different vision for what it should be that comes into conflict with both the design of the site and the vision of the core group.

While Stack Overflow can thrive with a smaller number of people asking "good" (yes, very subjective) questions it has difficulty when it strays into questions that need discussion (which its design comes into conflict with) or too many questions for the committed core group to maintain. Smaller sites can (and do) have a larger fraction of the user base committed to the goals of the site and in turn are able to provide more individual guidance - while Stack Overflow has long gone past that point.

---

Stack Overflow and its Q&A format that has been often copied works for certain sized user bases. It needs enough people to keep it interesting, but it fails to scale when too many people participate who have a different idea of what questions should be there.

There is a lack of moderation tools for the core user base to be able to manage it at scale (you will note the history of Stack Overflow has been removing and restricting moderation tools until it gets "too" bad - see also removal of 20k users helping with flag handling and the continued rescoping of close reasons).

Until someone comes up with a fundamentally different approach that is able to handle moderation at scale or sufficient barriers for new accounts (to handle the Eternal September problem), we are going to continue to see Stack Overflow clones spout and die on the vine along with a continued balkanization of knowledge in smaller areas that are able handle vision and moderation at a smaller scale.

---

Every attempt at a site I've seen since (and I include things like Lemmy in this which did a "copy reddit" and then worry (or not) about moderation) have started from a "get popular, then work on the moderation problem" which is ultimately too late to really solve the problem. The tools for moderation need to be baked into the design from the start.

user

9 months ago

[deleted]

hifromwork

9 months ago

>Stackoverflow mods and power users being arseholes reduces the use of Stackoverflow

While they are certainly not perfect, they willingly spend their own spare time to help other peoples for free. I disagree with calling them arseholes.

tomrod

9 months ago

A lot of people comment on online forums for free and are arseholes there too. Not in this thread so far that I've read, to be clear, but it certainly happens. How would you qualify the difference?

timhh

9 months ago

The people I am referring to are not helping. At this point they are making SO worse.

The problems are two-fold:

1. Any community with volunteer moderators attracts the kind of people you don't want to be moderators. They enjoy rigidly enforcing the rules even if it makes no sense.

2. There are two ways to find questions and answer them: random new questions from the review queue, and from Google when you're searching for a problem you have. SO encourages the former, and unfortunately the vast majority of questions are awful. If you go and review questions like this you will go "downvote close, downvote close, downvote close". You're going to correctly close a load of trash questions that nobody cares about and a load of good questions you just don't understand.

I've started recording a list of questions I've asked that get idiotic downvotes or closed, so I can write a proper rant about it with concrete examples. Otherwise you get people dismissing the problem as imaginary.

These mods now hold SO hostage. SO is definitely aware of the problem but they can't instigate proper changes to fix it because the mods like this situation and they revolt if SO tries to remedy it.

user

9 months ago

[deleted]

romeros

9 months ago

thats just cope. I stopped using stackoverflow because I get everything from chatpgt/claude. Just a case of having better tech.

Sure the mods were arseholes etc.. but before gpt never minded using it .

user

9 months ago

[deleted]

hollownobody

9 months ago

I disagree. LLMs are creating new knowledge when they are being used. A conversation with an LLM can carry a lot of signal in the user's input. I routinely send whole snippets of code, alongside an explanation/question as input.