Large language models reduce public knowledge sharing on online Q&A platforms

302 pointsposted 11 hours ago
by croes

262 Comments

insane_dreamer

6 hours ago

The problem is eventually what are LLMs going’s to draw from? They’re not creating new information, just regurgitating and combining existing info. That’s why they perform so poorly on code for which there aren’t many many publicly available samples, SO/reddit answers etc.

zmmmmm

an hour ago

It may be an interesting side effect that people stop so gratuitously inventing random new software languages and frameowrks because the LLMs don't know about it. I know I'm already leaning towards tech that the LLM can work well with, simply because being able to ask the LLM to solve 90% of the problem outweighs any marginal advantage using a slightly better language or framework offers. Fro example, I dislike Python as a language pretty intensely, but I can't deny that the LLMs are significantly better in Python than many other languages.

A4ET8a8uTh0

39 minutes ago

Alternatively, esoteric languages and frameworks will become even more lucrative ,simply because only the person who invented them and their hardcore following will understand half of it.

Obviously, not a given, but not unreasonable given what we have seen historically.

nfw2

an hour ago

Fwiw, GPT o1 helped me figure out how a fairly complex use case of epub.js, an open-source library with pretty opaque documentation and relatively few public samples. It took a few back-and-forths to get to a working solution, but it did get there.

It makes me wonder if the AI successfully found and digested obscure sources on the internet or was just better at making sense of the esoteric documentation than me. If the latter, perhaps the need for public samples will diminish.

TaylorAlexander

an hour ago

Well Gemini completely hallucinated command line switches on a recent question I asked it about the program “john the ripper”.

We absolutely need public sources of truth at the very least until we can build systems that actually reason based on a combination of first principles and experience, and even then we need sources of truth for experience.

You simply cannot create solutions to new problems if your data gets too old to encompass the new subject matter. We have so systems which can adequately determine fact from fiction, and new human experiences will always need to be documented for machines to understand them.

kachapopopow

an hour ago

Experienced the same thing with a library that has no documentation and takes advantage of c++23(latest) features.

neither_color

5 hours ago

I find that it sloppily goes back and forth between old and new methods, and as your LLM spaghetti code grows it becomes incapable of precision adding functions without breaking existing logic. All those tech demos of it instantly creating a whole app with one or a few prompts are junk. If you don't know what you're doing then as you keep adding features it WILL constantly switch up the way you make api calls(here's a file with 3 native fetch functions, let's install and use axios for no reason), the way you handle state, change your css library, etc.

{/* rest of your functions here*} - DELETED

After a while it's only safe for doing tedious things like loops and switches.

So I guess our jobs are safe for a little while longer

emptiestplace

an hour ago

Naively asking it for code for anything remotely complex is foolish, but if you do know what you're doing and understand how to manage context, it's a ridiculously potent force multiplier. I rarely ask it for anything without specifying which libraries I want to use, and if I'm not sure which library I want, I'll ask it about options and review before proceeding.

n_ary

6 hours ago

LLMs show their limits as you try to ask something new(introduced in last 6-12 months) being not used. I was asking Claude and GPT4o about a new feature of go, it just gave me some old stuff from go docs. Then I went to go docs(official) and found what I was looking for anyways, the feature was released 2 major versions back, but somehow neither GPT4o nor claude know about this.

SunlitCat

6 hours ago

With GPT 4o I had some success pointing it to the current documentation of projects I needed and had it giving me current and actual answers.

Like "Help me to do this and that and use this list of internet resources to answer my questions"

stickfigure

6 hours ago

> The problem is eventually what are LLMs going’s to draw from?

Published documentation.

I'm going to make up a number but I'll defend it: 90% of the information content of stackoverflow is regurgitated from some manual somewhere. The problem is that the specific information you're looking for in the relevant documentation is often hard to find, and even when found is often hard to read. LLMs are fantastic at reading and understanding documentation.

Const-me

6 hours ago

That is only true for trivial questions.

I've answered dozens of questions on stackoverflow.com with tags like SIMD, SSE, AVX, NEON. Only a minority of these asked for a single SIMD instruction which does something specific. Usually people ask how to use the complete instruction set to accomplish something higher level.

Documentation alone doesn't answer questions like that, you need an expert who actually used that stuff.

irunmyownemail

6 hours ago

Published documentation has been and can be wrong. In the late 1990's and early 2000's when I still did a mix of Microsoft technologies and Java, I found several bad non-obvious errors in MSDN documentation. AI today would likely regurgitate it in a soft but seemingly mild but arguably authoritative sounding way. At least when discussing with real people after the arrows fly and the dust settles, we can figure out the truth.

Ferret7446

4 hours ago

Everything (and everyone for that matter) can be and has been wrong. What matter is if it is useful. And AI as it is now is pretty decent at finding ("regurgitating") information in large bodies of data much faster than humans and with enough accuracy to be "good enough" for most uses.

Nothing will ever replace your own critical thinking and judgment.

> At least when discussing with real people after the arrows fly and the dust settles, we can figure out the truth.

You can actually do that with AI now. I have been able to correct AI many times via a Socratic approach (where I didn't know the correct answer, but I knew the answer the AI gave me was wrong).

roughly

6 hours ago

Yeah, this is wildly optimistic.

From personal experience, I'm skeptical of the quantity and especially quality of published documentation available, the completeness of that documentation, the degree to which it both recognizes and covers all the relevant edge cases, etc. Even Apple, which used to be quite good at that kind of thing, has increasingly effectively referred developers to their WWDC videos. I'm also skeptical of the ability of the LLMs to ingest and properly synthesize that documentation - I'm willing to bet the answers from SO and Reddit are doing more heavy lifting on shaping the LLM's "answers" than you're hoping here.

There is nothing in my couple decades of programming or experience with LLMs that suggests to me that published documentation is going to be sufficient to let an LLM produce sufficient quality output without human synthesis somehwere in the loop.

lossolo

an hour ago

Knowledge gained from experience that is not included in documentation is also significant part of SO. For example "This library will not work with service Y because of X, they do not support feature Y, as I discovered when I tried to use it myself" or other empirical evidence about the behavior of software that isn't documented.

elicksaur

6 hours ago

Following the article’s conclusion farther, humans would stop producing new documentation with new concepts.

jsemrau

2 hours ago

Data annotation is a thing that will be a huge business going forward.

mondrian

2 hours ago

Curious about this statement, do you mind expanding?

oblio

an hour ago

I'm also curious. For folks who've been around, the semantic web, which was all about data annotation, failed horribly. Nobody wants to do it.

finolex1

6 hours ago

There is still publicly available code and documentation to draw from. As models get smarter and bootstrapped on top of older models, they should need less and less training data. In theory, just providing the grammar for a new programming language should be enough for a sufficiently smart LLM to answer problems in that language.

Unlike freeform writing tasks, coding also has a strong feedback loop (i.e. does the code compile, run successfully, and output a result?), which means it is probably easier to generate synthetic training data for models.

layer8

5 hours ago

> In theory, just providing the grammar for a new programming language should be enough for a sufficiently smart LLM to answer problems in that language.

I doubt it. Take a language like Rust or Haskell or even modern Java or Python. Without prolonged experience with the language, you have no idea how the various features interact in practice, what the best practices and typical pitfalls are, what common patterns and habits have been established by its practitioners, and so on. At best, the system would have to simulate building a number of nontrivial systems using the language in order to discover that knowledge, and in the end it would still be like someone locked in a room without knowledge of how the language is actually applied in the real world.

oblio

an hour ago

> sufficiently smart LLM

Cousin of the sufficiently smart compiler? :-p

mycall

6 hours ago

I thought synthetic data is what is partially training the new multimodal large models, i.e. AlphaGeometry, o1, etc.

y7

6 hours ago

Synthetic data can never contain more information than the statistical model from which it is derived: it is simply the evaluation of a non-deterministic function on the model parameters. And the model parameters are simply a function of the training data.

I don't see how you can "bootstrap a smarter model" based on synthetic data from a previous-gen model this way. You may as well well just train your new model on the original training data.

antisthenes

6 hours ago

Synthetic data without some kind of external validation is garbage.

E.g. you can't just synthetically generate code, something or someone needs to run it and see if it performs the functions you actually asked of it.

You need to feed the LLM output into some kind of formal verification system, and only then add it back to the synthetic training dataset.

Here, for example - dumb recursive training causes model collapse:

https://www.nature.com/articles/s41586-024-07566-y

jneagu

6 hours ago

Anecdotally, synthetic data can get good if the generation involves a nugget of human labels/feedback that gets scaled up w/ a generative process.

HPsquared

3 hours ago

There are definitely a lot of wrong ways to do it. Doesn't mean the basic idea is unsound.

jneagu

6 hours ago

Yeah, There was a reference in a paywalled article a year ago (https://www.theinformation.com/articles/openai-made-an-ai-br...): "Sutskever's breakthrough allowed OpenAI to overcome limitations on obtaining high-quality data to train new models, according to the person with knowledge, a major obstacle for developing next-generation models. The research involved using computer-generated, rather than real-world, data like text or images pulled from the internet to train new models."

I suspect most foundational models are now knowingly trained on at least some synthetic data.

epgui

6 hours ago

In a very real sense, that’s also how human brains work.

elicksaur

6 hours ago

This argument always conflates simple processes with complex ones. Humans can work with abstract concepts at a level LLMs currently can’t and don’t seem likely capable of. “True” and “False” are the best examples.

epgui

6 hours ago

It doesn’t conflate anything though. It points to exactly that as a main difference (along with comparative functional neuroanatomy).

It’s helpful to realize the ways in which we do work the same way as AI, because it gives us perspective unto ourselves.

(I don’t follow regarding your true and false statement, and I don’t share your apparent pessimism about the fundamental limits of AI.)

empath75

6 hours ago

AI companies are already paying humans to produce new data to train on and will continue to do that. There's also additional modalities -- they've already added text, video, and audio, and there's probably more possible. Right now almost all the content being fed into these AIs is stuff that humans can sense and understand, but why does it have to limit itself to that? There's probably all kinds of data types it could train on that could give it more knowledge about the world.

Even limiting yourself to code generation, there are going to be a lot of software developers employed to write or generate code examples and documentation just for AIs to ingest.

I think eventually AIs will begin coding in programming languages that are designed for AI to understand and work with and not for people to understand.

imoverclocked

5 hours ago

> AI companies are already paying humans to produce new data to train on and will continue to do that.

The sheer difference in scale between the domain of “here are all the people in the world that have shared data publicly until now” and “here is the relatively tiny population of people being paid to add new information to an LLM” dooms the LLM to become outdated in an information hoarding society. So, the question in my mind is, “Why will people keep producing public information just for it to be devalued into LLMs?”

manmal

42 minutes ago

How would a custom language differ from what we have now?

If you mean obfuscation, then yeah, maybe that makes sense to fit more into the window. But it’s easy to unobfuscate, usually.

Otherwise, I‘m not sure what the goal of an LLM specific language could be. Because I don’t feel most languages have been made purely to accommodate humans anyway, but they balance a lot of factors, like being true to the metal (like C) or functional purity (Haskell) or fault tolerance (Erlang). I‘m not sure what „being for LLMs“ could look like.

jneagu

6 hours ago

Edit: OP had actually qualified their statement to refer to only underrepresented coding languages. That's 100% true - LLM coding performance is super biased in favor of well-represented languages, esp. in public repos.

Interesting - I actually think they perform quite well on code, considering that code has a set of correct answers (unlike most other tasks we use LLMs for on a daily basis). GitHub Copilot had a 30%+ acceptance rate (https://github.blog/news-insights/research/research-quantify...). How often does one accept the first answer that ChatGPT returns?

To answer your first question: new content is still being created in an LLM-assisted way, and a lot of it can be quite good. The rate of that happening is a lot lower than that of LLM-generated spam - this is the concerning part.

generic92034

6 hours ago

The OP has qualified "code" with bad availability of samples online. My experience with LLMs on a proprietary language with little online presence confirms their statement. It is not even worth trying, in many cases.

jneagu

6 hours ago

Fair point - I actually had parsed OP's sentence differently. I'll edit my comment.

I agree, LLMs performance for coding tasks is super biased in favor of well-represented languages. I think this is what GitHub is trying to solve with custom private models for Copilot, but I expect that to be enterprise only.

okoma

7 hours ago

The authors claim that LLM are reducing public knowledge sharing and that the effect is not merely displacing duplicate, low-quality, or beginner-level content.

However their claim is weak and the effect is not quite as sensational as they make it sound.

First, they only present Figure 3 and not regression results for their suggested tests of LLMs being substitutes of bad quality posts. In contrast, they report tests for their random qualification by user experience (where someone is experienced if they posted 10 times). Now, why would they omit tests by post quality but show results by a random bucketing of user “experience”?

Second, their own Figure 3 “shows” a change in trends for good and neutral questions. Good questions were downtrending and now they are flat, and neutral questions (arguably the noise) went from an uptrend to flat. Bad question continue to go down, no visible change in the trend. This suggests the opposite, ie that LLMs are in fact substituting bad quality content.

I feel the conclusion needed a stronger statement and research doesn’t reward meticulous but unsurprising results. Hence the sensational title and the somewhat redacted results.

Yacovlewis

2 hours ago

What if LLMs are effective enough at assisting coders that they're spending less time on SO and instead pushing more open source code, which is more valuable for everyone?

BolexNOLA

7 hours ago

While this article doesn’t really seem to be hitting what I am about to say, I think someone on HN a while back described a related phenomenon (which leads to the same issue) really well. The Internet is Balkanizing. This is hardly a new concept but they were drilling down specifically into online communities.

People are electing to not freely share information on public forums like they used to. They are retreating into discord and other services where they can put down motes and raise the draw bridges. And who can blame them? So many forums and social media sites and forums are engaging in increasingly hostile design and monetization processes, AI/LLM’s are crawling everywhere vacuuming up everything then putting them behind paywalls and ruining the original sources’ abilities to be found in search, algorithms designed to create engagement foster vitriol and controversy, the list goes on. HN is a rare exception these days.

So what happens? A bunch of people with niche interests or knowledge sets congregate into private communities and only talk to each other. Which makes it harder for new people to join. It’s a sad state of affairs if you ask me.

Simran-B

5 hours ago

Yes, it's sad. On the other hand, I think it's a good thing that people share knowledge less, publicly and free of charge on the web, because there is so much exploitation going on. Big corporations obviously capitalize on the good will of people with their LLMs, but there are also others who take advantage of the ones who want to help. A lot of users seemingly expect others to solve their problems for free and don't even put any effort into asking their questions. It's a massive drain for energy and enthusiasm, some even suffer from burnout (I assume more in open-source projects than on SO but still). I rather want it to be harder to connect with people sharing the same passion "in private" than having outsider who don't contribute anything profit off of activities happening in the open. This frustratingly appears to become the main reason for corporate open source these days.

verdverm

8 hours ago

For me, many of my questions about open source projects have moved to GitHub and Discord, so there is platform migration besides LLMs. I also tend to start with Gemini for more general programming things, because it will (1) answer in the terms of my problem instead of me having to visit multiple pages to piece it together, or (2) what it's wrong, I often get better jump off points when searching. Either way, LLMs save me time instead of having to click through to SO multiple times because the title is close but the content as an important difference

joshdavham

8 hours ago

> many of my questions about open source projects have moved to GitHub and Discord

Exact same experience here. Plus, being able to talk to maintainers directly has been great!

kertoip_1

2 hours ago

Both of those platforms are making answers harder to find. For me, a person used to getting the correct answer in Stackoverflow right away, scrolling through endless GitHub discussions is a nightmare. Aren't we just moving backwards?

klabb3

5 hours ago

No doubt that discord has struck a good balance. Much better than GitHub imo. Both for maintainers to get a soft understanding of their users, and equally beneficial for users who can interact casually without being shamed for filing an issue the wrong way.

There’s some weird blind spot with techies who are unable to see the appeal. UX matters in a “the medium is the message”-kind of way. Also, GitHub is only marginally more open than discord. It’s indexable at the moment, yes, but would not surprise me at all if MS is gonna make an offensive move to protect “their” (read our) data from AI competitors.

verdverm

4 hours ago

Chat is an important medium, especially as new generations of developers enter the field (they are more chat native). It certainly offers a more comfortable, or appropriate place, to ask beginner questions, or have quick back-n-forths, than GitHub issues/discussions offers. I've always wondered why GH didn't incorporate chat, seems like a big missed opportunity.

joshdavham

an hour ago

> I've always wondered why GH didn't incorporate chat

I've been wondering the same thing recently. It's really inefficient for me to communicate with my fellow maintainers through Github discussions, issues and pull request conversations so my go-to has been private discord conversations. This is actually kind of inefficent since most open source repos will always have a bigger community on github vs on discord (not to mention that it's a hassle when some maintainers are Chinese and don't have access to Discord...)

baq

7 hours ago

2022: Discord is not indexed by search engines, it sucks

2024: Discord is not indexed by AI slop generators, it's great

verdverm

5 hours ago

It's more that Discord is replacing Slack as the place where community happens. Less about about indexing, which still sucks even in Discord search. Slack/Salesforce threw a lot of small projects under the bus, post-acquisition, with the reductions to history from message count to 90 days

throwaway918299

4 hours ago

Discord stores trillions of messages. If they haven’t figured out how to make a slop generator out of it yet, I’m sure it’s coming soon.

rkncland

9 hours ago

Of course people reduce their free contributions to Stackoverflow. Stackoverflow is selling then out with the OpenAI API agreement and countless "AI" hype blog posts.

kertoip_1

an hour ago

I don't think it's the main reason. People don't care whether someone is selling stuff they create on a platform. Big social media has been doing it for many years now e.g. Facebook and yet it's still there. You come to SO for answers, why would you care that someone is teaching some LLM on them later?

pessimizer

an hour ago

> You come to SO for answers, why would you care that someone is teaching some LLM on them later?

This doesn't make the slightest bit of sense. The people who would be concerned are the ones who are providing answers. They are not coming to SO solely to get answers.

jeremyjh

9 hours ago

I think this is more about a drop in questions, than a drop in answers.

bryanrasmussen

8 hours ago

I mean part of the reason to not ask about stuff on SO, there are several types of questions that one might like to ask - such as:

I don't know the first thing about this thing, help me get to where I know the first thing. This is not allowed any more.

I want to know the pros and cons of various things compared. this is not allowed.

I have quality questions regarding an approach that I know how to do, but I want to know better ways. This is generally not allowed but you might slip through if you ask it just right.

I pretty much know really well what I'm doing but having some difficulty finding the right documentation on some little thing,help me - this is allowe

Something does not work as per the documentation, help me, this is allowed

I think I have done everything right but it is not working, this is allowed and is generally a typo or something that you have put in the wrong order because you're tired.

At any rate, the ones that are not allowed are the only questions that are worth asking.

The last two that is allowed I generally find gets answered in the asking - I'm pretty good in the field I'm asking in, the rigor of making something match SO question requirements leads me to the answer.

If I ask one of the interesting disallowed questions and get shit on then I will probably go through a period of screw it, I will just look extra hard for the documentation before I bother with that site again.

jakub_g

7 hours ago

I can see how frustrating it might be, but the overall idea of SO is "no duplicates". They don't want to have 1000 questions which are exactly the same but with slightly different phrasing. It can be problematic for total newcomers, but at the same time it makes it more useful for professionals: instead of having 1000 questions how to X with 1 reply, you have one canonical question with 20 replies sorted by upvotes and you can quickly see which one is likely the best.

FWIW, I found LLMs to be actually really good at those basic questions where I'm at expert at language X and I ask how to do similar thing in Y, using Y's terms (which might be named differently in X).

I believe this actually would work well:

- extra basic things, or things that depend on opinion etc: ask LLMs and let they infer and steer you

- advanced / off the beaten path questions that LLMs hallucinate on: ask on SO

noirscape

7 hours ago

The problem SO tends to run into is when you have a question that seems like it answers another question on the surface (ie. because the question title is bad) and then a very different question is closed with the dupe reason pointing to that question because the close titles are similar.

Since there's no way to appeal duplicate close votes on SO until you have a pretty large amount of rep, this kinda creates a problem where there's a "silent mass" of duplicate questions that aren't really duplicates.

A basic example is this question: https://stackoverflow.com/q/27957454 , which is about disabling PRs on GitHub on the surface. The body text however reveals that the poster is instead asking how they can set up branch permissions and get certain accounts to bypass them.

I can already assure you that just by basic searching, this question will pop up first when you look up disabling PRs, and the accepted answer answers the question body (which means that it's almost certain a different question has been closed as a duplicate of this one), rather than the question title. You could give a more informative answer (which kinda happened here), but this is technically off-topic to the question being closed.

That's where SO gets it's bad rep for inaccurate duplicate closing from.

bryanrasmussen

7 hours ago

>I can see how frustrating it might be

It's certainly not frustrating for me, I ask a question maybe once a year on SO, most of their content is, in my chosen disciplines, not technically interesting, it is no better than looking up code snippets in documentation (which most of the time is what it really, really is)

I suppose it's frustrating for SO that people no longer find it worthwhile to ask questions there.

>advanced / off the beaten path

show me an advanced and off the beaten path question that SO has answered well, that is just not worth the effort to try to get an answer - if you have an advanced and off the beaten path question that you can't answer then you ask it on SO just "in case" but really you will find the answer somewhere else or not at all in my experience.

Izkata

5 hours ago

> I don't know the first thing about this thing, help me get to where I know the first thing. This is not allowed any more.

This may have been allowed in like the first year while figuring out what kind of moderation worked, but it hasn't been as least since I started using it in like 2011. They just kept slipping through the cracks because so many questions are constantly being posted.

SoftTalker

8 hours ago

The first one especially is not interesting except to the person asking the question, who wants to be spoon-fed answers instead of making any effort of his own to acquire foundational knowledge. Often these are students asking for someone to solve their homework problems.

Pro/Con questions are too likely to involve opinion and degenerate into flamewars. Some could be answered factually, but mostly are not. Others have no clear answers.

bryanrasmussen

7 hours ago

thank you for bringing the default SO reasons why these are not the case, but first off

>Often these are students asking for someone to solve their homework problems.

I don't think I've been in any class since elementary school in which I did not have foundational knowledge, I'm talking "I just realized there must be a technical discipline that handles this issue and I can't google my way to it level of questions."

If I'm a student, I have a textbook and the ability to read. I'm not asking textbook readable or relevant literature readable in the thing I am studying questions because I, being in a class on the subject I would "know the first thing" to quote my earlier post, that first thing being how to get more good and relevant knowledge on the thing I am in a class in.

I'm talking about things you don't even know what questions to ask to get that foundational knowledge which is among the most interesting questions to ask - the problem with SO is it only wants me to ask questions in a field in which I am already fairly expert but I have just hit a temporary stumbling block for some reason.

I remember when I was working on a big government security project and there was a Java guy who was an expert in a field that I knew nothing about and he would laugh and say you can't go to SO and ask about how do I ... long bit of technical jargon outside my field that I sort of understood hung together, maybe eigenvectors came up (this was in 2013)

Second thing, yes I know SO does not want people to ask non-factual questions, and it does not want me to ask questions in fields in which I am uninformed, so it follows it wants me to ask questions that I can probably find out myself one way or another, only SO is more convenient.

I gave some reasons why I do not find SO particularly convenient or useful given their constraints implying this is probably the same for others, you said two of my reasons were no good, but I notice you did not have any input on the underlying question of - why are people not asking as many questions on SO as they once did?

SoftTalker

6 hours ago

SO is what it is, they have made the choices they made as to what questions are appropriate on their platform.

I don't know why SO questions are declining -- perhaps people find SO frustrating, as you seem to, and they give up. I myself have never posted a question on SO as I generally have found that my questions had already been asked and answered. And lately, perhaps LLMs are providing better avenues for the sorts of questions you describe. That seems very plausible to me.

Ferret7446

4 hours ago

> I don't think I've been in any class since elementary school in which I did not have foundational knowledge

> If I'm a student, I have a textbook and the ability to read

You are such an outlier that I don't think you have the awareness to make any useful observations on this topic. Quite a lot of students in the US are now starting to lack the ability to read, horrifyingly (and it was never 100%), and using ChatGPT to do homework is common.

Ferret7446

4 hours ago

The problem is that SO is not a Q&A site although it calls itself that (which is admittedly misleading). It is a community edited knowledgebase, basically a wiki, where the content is Q&As. It just so happens that one method of contributing to the site is by writing questions for other people to write answers to.

If you ask a question (i.e., add content to the wiki) that is not in scope, then of course it will get removed.

fforflo

6 hours ago

Well, we know we'll have reached AGI when LLM says "this chat has been marked as duplicate"

atomic128

8 hours ago

Eventually, large language models will be the end of open source. That's ok, just accept it.

Large language models are used to aggregate and interpolate intellectual property.

This is performed with no acknowledgement of authorship or lineage, with no attribution or citation.

In effect, the intellectual property used to train such models becomes anonymous common property.

The social rewards (e.g., credit, respect) that often motivate open source work are undermined.

That's how it ends.

zmgsabst

7 hours ago

Why wouldn’t you use LLMs to write even more open source?

The cost of contributions falls dramatically, eg, $100 is 200M tokens of GPT-3.5; so you’re talking enough to spend 10,000 tokens developing each line of a 20kloc project (amortized).

That’s a moderate project for a single donation and an afternoon of managing a workflow framework.

atomic128

6 hours ago

What you're describing is "open slop", and yes, there will be a lot of it.

Open source as we know it today, not so much.

yapyap

8 hours ago

no it won’t, it’ll just make it more niche than it already is.

atomic128

3 hours ago

LLM users are feeding their entropy into the model, and paying for the privilege.

These LLM users produce the new training data. They are being assimilated into the tool.

This is the future of "open source": Anonymous common property continuously harvested from, and distributed to, LLM users.

gspr

2 hours ago

I don't understand this take.

If LLMs will be the end of open source, then they will constitute that end for exactly the reason you write:

> Large language models are used to aggregate and interpolate intellectual property.

> This is performed with no acknowledgement of authorship or lineage, with no attribution or citation.

> In effect, the intellectual property used to train such models becomes anonymous common property.

And if those things are true and allowed to continue, then any IP relying on copyright is equally threatened. That could of course be the case, but it's hardly unique to open source. Open source is no different, here. Or are you suggesting that non-open-source copyrighted material (code or otherwise) is protected by keeping the "source" (or equivalent) secret? Good luck making money on that blockbuster movie if you don't dare show it to anyone, or that novel if you don't dare let people read it.

> The social rewards (e.g., credit, respect) that often motivate open source work are undermined.

First of all: Those aren't the only social rewards that motivate open source work. I'd even wager they aren't the most common motivators. Those rewards seem more like the image that actors that try to social-network-ify or gamify open source work want to paint.

Second: Why would those things go away? The artistic joy that drives a portrait painter didn't go away when the camera was invented. Sure, the pure monetary drive might suffer, but that drive is perhaps the drive that's least specific to open source work.

A4ET8a8uTh0

27 minutes ago

<< Why would those things go away?

I think that is because, overall, the human nature does not change that much.

<< Open source is no different, here. Or are you suggesting that non-open-source copyrighted material (code or otherwise) is protected by keeping the "source" (or equivalent) secret? Good luck making money on that blockbuster movie if you don't dare show it to anyone, or that novel if you don't dare let people read it.

You may be conflating several different media types and we don't even know what the lawsuit tea leaves will tell us about that kind of visual/audio IP. As far as code goes, I think most companies have already shown how they protect themselves from 'open' source code.

Havoc

8 hours ago

I’d imagine they also narrow the range of knowledge and discourse in general.

A bit like if you ask an LLM to tell you a joke they all tend to go with the same one

MASNeo

8 hours ago

Wondering about wider implications. If technical interactions reduce online, how about RL and how do we rate a human competence against an AI once society gets a habit from asking an AI first? Will we start to constantly question human advice or responses and what does that do to the human condition.

I am active in a few specialized fields and already I have to defined my advice against poorly crafted prompt responses.

VancouverMan

5 hours ago

> Will we start to constantly question human advice or responses and what does that do to the human condition.

I'm surprised when people don't already engage in questioning like that.

I've had to be doing it for decades at this point.

Much of the worst advice and information I've ever received has come from expensive human so-called "professionals" and "experts" like doctors, accountants, lawyers, financial advisors, professors, journalists, mechanics, and so on.

I now assume that anything such "experts" tell me is wrong, and too often that ends up being true.

Sourcing information and advice from a larger pool of online knowledge, even if the sources may be deemed "amateur" or "hobbyist" or "unreliable", has generally provided me with far better results and outcomes.

If an LLM is built upon a wide base of source information, I'm inclined to trust what it generates more than what a single human "professional" or "expert" says.

toofy

37 minutes ago

does this mean you trust complete randoms just as much?

if i need advice on repairing a weird unique metal piece on a 1959 corvette, im going to trust the advice of an expert in classic corvettes way before i trust the advice of my barber who knows nothing about cars but confidently tells me to check the tire pressure.

this “oh no, experts have be wrong before” we see so much is wild to me. in nuanced fields i’ll take the advice of experts any day of the week waaaaaay before i take the advice from someone who’s entire knowledge of topic comes from a couple twitter post and a couple of youtube’s but their rhetoric sounds confident. confidently wrong dipshits and sophists are one of the plagues of the modern internet.

in complex nuanced subjects are experts wrong sometimes? absofuckinlutely. in complex nuanced subjects are they correct more often than random “did-my-own-research-for-20-minutes-but-got-distracted-because-i-can’t-focus-for-more-than-3-paragraphs-but-i-sound-confident guy?” absofuckinlutely.

wizzwizz4

2 hours ago

These systems behave more like individual experts than like the internet – except they're more likely to be wrong than an expert is.

bloomingkales

8 hours ago

Guess we need an Agent that logs and re-contributes to Stackoverflow (for example) automatically.

Then also have agents that automatically give upvotes for used solutions. Weird world.

I’m just imagining the precogs talking to each other in Minority Report if that makes sense.

rq1

8 hours ago

People should just share their conversations with the LLMs online no?

This would blogging 5.0. Or web 7.0.

SunlitCat

6 hours ago

Well, I just asked ChatGPT to answer my "How to print hello world in c++" with a typical stack overflow answer.

Lo and behold, the answer is very polite, explanative and even lists common mistakes. It even added two very helpful user comments!

I asked it again how this answer would look in 2024 and it just updated the answer to the latest c++ standard!

Then! I asked it what a moderator would say when they chime in. Of course the moderator reminded everyone to stay on focus regarding the question, avoid opinions and back their answer by documentation or standards. In the end the mod thanked for everyone's contribution and keeping the discussion constructive!

Ah! What a wonderful world ChatGPT is living at! I want to be there too!

p0w3n3d

4 hours ago

That's what I've been predicting and scared of: LLMs learn from online Q&A platforms, but people already stop posting questions and receiving answers. The sole knowledge sources will get poisoned with inaccurate LLM generated data, and therefore the entropy available to LLMs will become damped by the LLMs itselves (in a negative feedback loop)

optimiz3

9 hours ago

If a site aims to commoditize shared expertise, royalties should be paid. Why would anyone willingly reduce their earning power, let alone hand away the right for someone else to profit from selling their knowledge, unattributed no less.

Best bet is to book publish, and require a license from anyone that wants to train on it.

afh1

9 hours ago

Why open source anything, let alone with permissive licensing, right?

immibis

27 minutes ago

This is a real problem with permissive licensing. Large corporations effectively brainwashed large swaths of developers into working for free. Not working for the commons for free, as in AGPL, but working for corporations for free.

optimiz3

9 hours ago

To a degree, yes. I only open source work where I expect reciprocal value from other contributions.

johannes1234321

8 hours ago

There is a lot of indirect hardly measurable value one can gain.

Going back to the original source: By giving an answer to somebody on a Q&A site, they might be a kid learning and then building solutions I benefit from later, again. Similar with software.

And I also consider the total gain of knowledge for our society at large a gain.

While my marginal cost form many things is low. And often lower than a cost-benefit calculation.

And some Q&A questions strike a nerve and are interesting to me to answer (be it in thinking about the problem or in trying to boiling it down to a good answer), similar to open source. Some programming tasks as fun problems to solve, that's a gain, and then sharing the result cost me nothing.

benfortuna

9 hours ago

I think that is antithetical to the idea of Open Source. If you expect contributions then pay a bounty, don't pretend.

optimiz3

9 hours ago

The bounty is you getting to use my work (shared in good faith no less). Appreciate the charity and don't be a freeloader or you'll get less in the future.

andrepd

9 hours ago

GPL is antithetical to open source? Odd take

verdverm

8 hours ago

There is a permissionless (MIT) vs permissioned (GPL) difference that is at the heart of the debate of what society thinks open source should mean

Y_Y

9 hours ago

See also: BSD vs. GPL

jncfhnb

9 hours ago

Because it’s a marginal effect on your earning power and it’s a nice thing to do.

optimiz3

8 hours ago

The management of these walled gardens will keep saying that to your face as they sell your contributions. Meanwhile your family gets nothing.

jncfhnb

7 hours ago

Did your family get anything from you sharing this opinion? If not, why did you share it? Are you suggesting that your personal motivations for posting this cynicism are reasonable but that similar motivations that are altruistic for helping someone are not?

optimiz3

5 hours ago

Sharing this opinion doesn't sacrifice my primary economic utility, and in fact disseminates a sentiment that if more widespread would empower everyone to realize more of the value they offer. Please do train an LLM to inform people to seek licensing arrangements for the expertise they provide.

jncfhnb

5 hours ago

That’s just dumb, man. You’re not sacrificing anything by giving someone a helpful answer.

8note

2 hours ago

Giving it away for free, you are ensuring there isn't a consulting gig that charges for giving helpful answers.

AlexandrB

8 hours ago

"It's a nice thing to do" never seems to sway online platforms to treat their users better. This kind of asymmetry seems to only ever go one way.

falcor84

8 hours ago

As a mid-core SO user (4 digit reputation), I never felt like I needed them to treat me better. I always feel that while I'm contributing a bit, I get so much more value out of SO than what I've put in, and am grateful for it being there. It might also have something to do with me being old enough to remember the original expertsexchange, as well as those MSDN support documentation CDs. I'm much happier now.

immibis

26 minutes ago

Stack Overflow won't even let me delete my own content now that they're violating the license to it.

wwweston

6 hours ago

When the jobs side of SO was active, it effectively did this. Strong answers and scoring were compensated with prospective employer attention. For a few years, this was actually where the majority of my new job leads came from. It was a pretty rewarding ecosystem, though not without its problems.

Not sure why they shut down jobs; they recently brought back a poorer version of it.

malicka

8 hours ago

While there is a thing to be said about the unethical business practices of Quora/StackOverflow, I reject the framing of “reducing your earning power.” Not everything is about transactions or self-benefit, especially when it comes to knowledge; it’s about contributing and collaboration. There is immense intrinsic value to that. I’m glad we don’t live in your world, where libre software is a pipe-dream and hackers hoard their knowledge like sickly dragons.

simonw

8 hours ago

... you just shared your expertise here on Hacker News in the form of this comment without any expectation of royalties. How is posting on StackOverflow different?

krtalc

8 hours ago

One could answer that question to people whose salary does not depend upon not understanding the answer.

immibis

33 minutes ago

So do the new corporate policies of those platforms.

gigatexal

6 hours ago

Because toxic but well meaning mods at stack overflow made us not want to use them anymore.

Abecid

4 hours ago

I think this is just the future though. Why ask other people if LLMs can just retrieve, read, and train on official documentations

jetsetk

4 hours ago

Official documentations are not always complete. It depends on the diligence of who wrote them and how good they are at writing. Customers and users will always send mails or open tickets to ask this and that about the docs afterwards. Can't rely on just learning or retrieving from the docs. Clarifications by some dev or someone who found a solution/workaround will always be required.

joshdavham

7 hours ago

With that being said, I imagine the quality of the questions have also improved quite a bit. I definitely don’t condone the rude behaviour on SO, but I also understand that the site used to be bombarded constantly with low quality questions that now thankfully LLMs can handle.

vitiral

9 hours ago

We need to refine our tech stack to create a new one which is understandable by humans, before LLMs pollute our current stack to the point it's impossible to understand or modify. That's what I'm doing at https://lua.civboot.org

mrcino

8 hours ago

By Public knowledge sharing, do they mean bazillions of StackOverflow duplicates?

knotimpressed

8 hours ago

The article mentions that all kinds of posts were reduced, not just duplicates or even simple questions.

delduca

2 hours ago

Marked as duplicated.

kajaktum

7 hours ago

I have no idea where to ask questions nowadays. Stackoverflow is way "too slow" (Go to website, write a nice well formatted thread, wait for answers). But there's way faster solutions now, namely from message groups.

For example, I was wondering if its okay to move my home directory to a different filesystem altogether and create a symlink from /home/. Where do I ask such questions? The freaking ZFS mailing list? SO? It was just a passerby question, and what I wanted more than the answer is the sense of community.

The only place that I know that have a wide enough range of interest, with many people that each know some of these stuff quite deep, is public, is easily accessible is unironically 4chan /g/.

I would rather go there then Discord where humanity's knowledge will be piped to /dev/null.

nunez

3 hours ago

Reddit was a place until the API changes were made. Discord is another at the cost of public discoverability. Barring that, man pages and groking the sources.

CoastalCoder

7 hours ago

I guess I'm out of the loop. What does "/g/" mean?

aezart

6 hours ago

It's the technology message board on 4chan, each board has a name like that. /a/ for anime, /v/ for video games, etc.

scotty79

10 hours ago

Don't they just reduce the Q part of Q&A? And since the Q was A-d by AI doesn't that mean that A was there already and people just couldn't find it but AI did?

lordgrenville

10 hours ago

The answer by humans is a) publicly accessible b) hallucination-free (although it still may not be correct) c) subject to a voting process which gives a good signal of how much we should trust it.

Which makes me think, maybe a good move for Stack Overflow (which does not allow the submission of LLM-generated answers, wisely imo) would be to add an AI agent that would suggest an answer for each question, that people could vote on. That way you can elicit human and machine answers, and still have the verification process.

david-gpu

9 hours ago

As a user, why would I care whether an answer is "incorrect" or "hallucinated"? Neither one is going to solve the problem I have at hand. It sounds like a distinction without a difference.

lordgrenville

8 hours ago

One relevant difference is that a better-quality human answer is correlated with certain "tells": correct formatting and grammar, longer answers, higher reputation. An incorrect LLM answer looks (from the outside) exactly the same as a correct answer.

mikepurvis

8 hours ago

Obviously there are exceptions but human-wrong answers tend to be more subtly wrong whereas hallucinated answers are just baffling and nonsensical.

Davidzheng

9 hours ago

I don't think human mistakes are distinguishable from hallucinations.

Y_Y

9 hours ago

Let's train a discriminator and see!

intended

9 hours ago

Why a vote? Voting != Verification.

TeMPOraL

9 hours ago

LLMs are much better experience on the "Q side". Sure, there's the occasional hallucination here and there, but QnA sites are not all StackOverflow. Most of them are just content farms for SEO and advertising purposes - meaning, the veracity of the content doesn't matter, as long as it's driving clicks. At this moment, this makes LLMs much more trustworthy.

scotty79

9 hours ago

It's a good idea but probably not easy to implement. SO answers are usually quite neat, like an email. Solving a problem with ChatGPT is more like ... chat. It's hard to turn it into something googlable and Google is how SO gets most of its traffic and utility.

torginus

2 hours ago

Honestly, online QA platforms do a fine job by themselves. Just today, I've found out that Quora started locking its high-quality answers made by actual experts behind paywalls. Get bent.

jmyeet

9 hours ago

It's a losing battle to try and maintain walled gardens for these corpuses of human-generated text that have become valuable to train LLMs. The horse has probably already bolted.

I see this as a temporary problem however because LLMs are transitional. At some point it won't be necessary to train an LLM on the entirety of Reddit plus everything else ever written because there are obvious limits to statistical models like this and, as a counter point, that's not how humans learn. You may have read hundres of books in your life, maybe even thousands. You haven't read a million. You don't need to.

I find it interesting that this issue (which is theft, to be clear) is being framed as theft from the site or company that "owns" that data, rather than theft from the users who created it. All these user-generated content ("UGC") sites are doomed to eventually fail because their motivations diverge from their users and the endless quest to increase profits inevitably drives users away.

Another issue is how much IP consumption constitutes theft? If an LLM watches every movie ever made, that's probably theft. But how many is too many? Like Apocalypse Now was loosely based on or at least inspired by Heart of Darkness (the novel). Yet you can't accuse a human of "theft" by reading Heart of Darkness.

All art is derivative, as they say.

vlovich123

9 hours ago

> At some point it won't be necessary to train an LLM on the entirety of Reddit plus everything else ever written because there are obvious limits to statistical models like this and, as a counter point, that's not how humans learn. You may have read hundres of books in your life, maybe even thousands. You haven't read a million. You don't need to.

I agree but I think it may be privileging the human intelligence mechanism a bit too much. These LLMs are polymaths that can spit out content at a super human rate. It can generate poetry and literature similarly to code and answers about physics and car repair. It’s very rare for a human to be able to do that especially these days.

So I agree they’re transitional but only in the sense that our brains are transitional from the basal ganglia to the neocortex. In that sense I think LLMs will probably be a part of a future GAI brain with other things tracked on, but it’s not clear it will necessarily evolve to work like a human’s brain does.

jprete

9 hours ago

I think the actual reason people can't do it is that we avoid situations with high risk and no apparent reward. And we aren't sufficiently supportive of other people doing surprising things (so there's no reward for trying). I.e. it's a modern culture problem, not a human brain problem.

jmyeet

9 hours ago

> These LLMs are polymaths that can spit out content at a super human rate.

Do you mean in theory or currently? Because currently, LLMs make simple errors (eg [1]) and are more capable of spitting out, well, nonsense. I think it's safe to say we're a long way from LLMs producing anything creatively good.

I'll put it this way: you won't be getting The Godfather from LLMs anytime soon but you can probably get an industrial film with generic music that tells you how to safely handle solvents, maybe.

Computers are generally good at doing math but LLMs generally aren't [2] and that really demonstrates the weaknesses in this statistical approach. ChatGPT (as one example) doesn't understand what numbers are or how to multiply them. It relies seeing similar answers to derive a likely answer so it often gets the first and large digits of the answer correct but not the middle. You can't keep scaling the input data to have it see every possible math question. That's just not practical.

Now multiplying two large numbers is a solvable problem. Counting Rs in strawberry is a solvable problem. But statistical LLMs are going to have a massive long tail of these problems. It's really going to take the next generational change to make progress.

[1]: https://www.inc.com/kit-eaton/how-many-rs-in-strawberry-this...

[2]: https://www.reachcapital.com/2024/07/16/why-llms-are-bad-at-...

simonw

8 hours ago

Both the "count the Rs in strawberry" and the "multiply two large numbers" things have been solved for over a year now by the tool usage pattern: give an LLM the ability to delegate to a code execution environment for things it's inherently bad at and train it how to identify when to use that option.

vlovich123

8 hours ago

I think the point is that playing whack a mole is an effective practical strategy to shore up individual weaknesses (or even classes of weaknesses) but that doesn’t get you to general reasoning unless you think that intelligence evolved this way. Given the adaptability of intelligence across the animal kingdom to novel environments never seen before, I don’t think that can be anything other than a short term strategy for AGI.

simonw

6 hours ago

Sure, LLMs won't ever get to general reasoning (for pick your definition of "reasoning") unassisted.

I think that adding different forms of assistance remains the most interesting pattern right now.

vlovich123

8 hours ago

I think we’re in agreement. It’s going to take next generation architecture to address the flaws where the LLM often can’t even correct its mistake when it’s pointed out as with the strawberry example.

I still think transformers and LLMs will likely remain as some component within that next gen architecture vs something completely alien.

0x1ceb00da

8 hours ago

> You may have read hundres of books in your life, maybe even thousands. You haven't read a million. You don't need to.

Sometimes online forums are the only place where you can find solutions for niche situations and edge cases. Tricks which would have been very difficult to figure out on your own. LLMs can train on the official documentation of tools l/libraries but they can't experiment and figure out solutions to weird problems which are unfortunately very common in tech industry. If people stop sharing such solutions with others, it might become a big problem.

simonw

8 hours ago

"LLMs can train on the official documentation of tools l/libraries but they can't experiment and figure out solutions to weird problems"

LLMs train on way more than just the official documentation: they train on the code itself, the unit tests for that code (which, for well written projects, cover all sorts of undocumented edge-based) and - for popular projects - thousands of examples of that library being used (and unit tested) "in the wild".

This is why LLMs are so effective at helping figure out edge-cases for widely used libraries.

The best coding LLMs are also trained on additional custom examples written by humans who were paid to build proprietary training data for those LLMs.

I suspect they are increasingly trained on artificially created examples which have been validated (to a certain extent) through executing that code before adding it to the training data. That's a unique advantage for code - it's a lot harder to "verify" non-code generated prose since you can't execute that and see if you get an error.

0x1ceb00da

8 hours ago

> they train on the code itself, the unit tests for that code

If understanding the code was enough, we wouldn't have any bugs or counterintuitive behaviors.

> and - for popular projects - thousands of examples of that library being used (and unit tested) "in the wild".

If people stopped contributing to forums, we won't have any such data for new things that are being made.

simonw

6 hours ago

The examples I'm talking about come from openly licensed code in sources like GitHub, not from StackOverflow.

I would argue that code in GitHub is much more useful, because it's presented in the context of a larger application and is also more likely to work.

skydhash

8 hours ago

> Sometimes online forums are the only place where you can find solutions for niche situations and edge cases.

That's the most valuable aspect of it. When you find yourself in these niches situations, it's nice when you see someone has encountered it and has done the legwork to solve it, saving you hours and days. And that's why Wikis like the Arch Wiki are important. You need people to document the system, not just individual components.

falcor84

8 hours ago

> that's not how humans learn

I've been thinking about this a lot lately. Could we train an AI, e.g. using RL and GAN, where it gets an IT task to perform based on a body of documentation, such that its fitness would then be measured based on both direct success on the task, and on the creation of new (distilled and better written) documentation that would allow an otherwise context-less copy of itself to do well on the task?

jumping_frog

8 hours ago

Just to add to your point, consider a book like "Finite and Infinite" games. I think I can "recreate" the knowledge and main thesis in the book by my readings from other areas.

'Listening to different spiritual gurus saying the same thing using different words' is like 'watching the same coloured glass pieces getting rearranged to form new patterns in kaleidoscope'

szundi

8 hours ago

Only half true as maybe reasoning and actual understanding is not the strength of LLMs but it is fascinating that they actually can produce good info from everything they have read - unlike me who only read a fraction of that. Maybe dumb, but good memory.

So I think future AI has to read also everything if it is used like ChatGPT these days by average people to ask for advice about almost anything.

airstrike

9 hours ago

> (which is theft, to be clear)

> Another issue is how much IP consumption constitutes theft? If an LLM watches every movie ever made, that's probably theft.

It's hard to reconcile those two views, and I don't think theft is defined by "how much" is being stolen.

Artgor

9 hours ago

Well, if the users ask frequent/common questions to ChatGPT and get acceptable answers, is this even a problem? If the volume of duplicate questions decreases, there should be no bad influence on the training data, right?

jeremyjh

8 hours ago

They spoke to this point in the abstract. They observe a similar drop in less common and more advanced questions.

melenaboija

8 hours ago

It’s been a relief to find a platform where I can ask questions without the fear of being humiliated

Half joking, but I am pretty tired of SO pedantry.

PhilipRoman

8 hours ago

I haven't really found stackoverflow to be that humiliating (compared to some IRC rooms or forums), basic questions get asked and answered all the time. But the worst part is when you want to do something off the beaten path.

Q: how do I do thing X in C?

A: Why do you need to know this? The C standard doesn't say anything about X. The answer will depend on your compiler and platform. Are you sure you want to do X instead Y? What version of Ubuntu are you running?

haolez

8 hours ago

The first time that I asked a question on #cpp @Freenode was a unique experience for my younger self.

My message contained greetings and the question in the same message. I was banned immediately and the response from the mods was:

- do not greet; we don't have time for that bullshit

- do not use natural language questions; submit a test case and we will understand what you mean through your code

- do not abbreviate words (you have abbreviated "you" as "u"); if you do not have time to type the words, we do not have time to read them

The ban lasted for a week! :D

jancsika

5 hours ago

This being HN, I'd love to hear from one of the many IRC channel mods who literally typed (I'd guess copy/pasted) this kind of text into their chat room topics and auto-responders.

If you're out there-- how does it feel to know that what you meant as a efficient course-correction for newcomers was instead a social shaming that cut so deep that the message you wrote is still burned verbatim into their memory after all these years?

To be clear, I'm taking OP's experience as a common case of IRC newbies at that time on many channels. I certainly experienced something like it (though I can't remember the exact text), and I've read many others post on HN about the same behavior from the IRC days.

Edit: clarifications

HPsquared

4 hours ago

I think a lot of unpaid online discussion forum moderation volunteers get psychic profit from power tripping.

hinkley

4 hours ago

Give a man a little power.

CogitoCogito

4 hours ago

> was instead a social shaming that cut so deep that the message you wrote is still burned verbatim into their memory after all these years?

Maybe that was the point?

haolez

24 minutes ago

To be fair, after the ban expired, I started submitting the test cases as instructed and the community was very helpful under these constraints.

bqmjjx0kac

7 hours ago

> do not use natural language questions

That is really absurd! AFAIK, it is not possible to pose a question to a human in C++.

This level of dogmatism and ignorance of human communication reminds me of a TL I worked with once who believed that their project's C codebase was "self-documenting". They would categorically reject PRs that contained comments, even "why" comments that were legitimately informative. It was a very frustrating experience, but at least I have some anecdotes now that are funny in retrospect.

ravenstine

5 hours ago

Self-documenting code is one of the worst ideas in programming. Like you, I've had to work with teams where my PRs would be blocked until I removed my comments. I'm not talking pointless comments like "# loop through the array" but JSdoc style comments describing why a function was needed.

I will no longer work anywhere that has this kind of culture.

seattle_spring

2 hours ago

Hard to agree or disagree without real examples. I've worked with people who insist on writing paragraphs of stories as comments on top of some pretty obviously self-descriptive code. In those cases, the comments were indeed just clutter that would likely soon be out of date anyway. Conversely, places that need huge comments like that usually should just be refactored anyway. It's pretty rare to actually need written comments to explain what's going on when the code is written semantically and thoughtfully.

wccrawford

8 hours ago

A one week ban on the first message is clearly gatekeeping. What a bunch of jerks. A 1 hour ban would have been a lot more appropriate, and escalate from there if the person can't follow the rules.

Don't even get me started about how dumb rule 2 is, though. And rule 3 doesn't even work for normal English as many things are abbreviated, e.g. this example.

And of course, you didn't greet and wait, you just put a pleasantry in the same message. Jeez.

I'm 100% sure I'd never have gone back after that rude ban.

GeoAtreides

8 hours ago

"I'm 100% sure I'd never have gone back after that rude ban."

upon saying this, the young apprentice was enlightened

luckylion

6 hours ago

> And of course, you didn't greet and wait, you just put a pleasantry in the same message. Jeez.

I'm pretty sure that "rule" was more aimed towards "just ask your question" rather than "greet, make smalltalk, then ask your question".

I have similar rules, though I don't communicate them as aggressively, and don't ban people for breaking them, I just don't reply to greetings coming from people I know aren't looking to talk to me to ask me how I've been. It's a lot easier if you send the question you have instead of sending "Hi, how are you?" and then wait for 3 minutes to type out your question.

SunlitCat

7 hours ago

That contradiction is funny, tho:

> - do not greet; we don't have time for that bullshit

and

> do not abbreviate words (you have abbreviated "you" as "u"); if you do not have time to type the words, we do not have time to read them

So they have apparently enough time to read full words, it seems!

sokoloff

7 hours ago

I think reading “u” takes longer than reading “you”.

With “u”, I have to pause for a moment and think “that’s not a normal word; I wonder if they meant to type ‘i’ instead (and just hit a little left of target)?” and then maybe read the passage twice to see which is more likely.

I don’t think it’s quite as much a contradiction. (It still could be more gruff than needed.)

kfajdsl

3 hours ago

probably a generational thing, I process that and other common texting/internet abbreviations exactly like normal words.

beeboobaa3

7 hours ago

Yh u gtta b c00l w abbrvs

6510

5 hours ago

The noobs don't got how we get where we get?

edit: I remember how some communities changed into: The help isn't good enough, you should help harder, I want you to help me by these conventions. Then they leave after getting their answer and no one has seen them ever again rather than join the help desk.

tinco

6 hours ago

I learned a bunch of programming languages on IRC, and the C and C++ communities on freenode were by far the most toxic I've encountered.

Now that Rust is succesfully assimilating those communities, I have noticed the same toxicity on less well moderated forums, like the subreddit. The Discord luckily is still great.

It's probably really important to separate the curmudgeons from the fresh initiates to provide an enjoyable and positive experience for both groups. Discord makes that really easy.

In the Ruby IRC channel curmudgeons would simply be shot down instantly with MINASWAN style arguments. In the Haskell IRC channel I guess it was basically accepted that everyone was learning new things all the time, and there was always someone willing to teach at the level you were trying to learn.

betaby

6 hours ago

Not my experience. IRC was 'toxic' since forever, but that't not a toxicity, that's inability to read emotion through transactional plan text. Once one account that in the mental model IRC is just fine.

bayindirh

5 hours ago

Yes, immature people are everywhere, but SO took it to a new level before they had to implement a code of conduct. I remember asking questions and getting "this is is common misconception, maybe you're looking for X instead" type of actually helpful and kind answers.

After some point it came to a point that if you're not asking a complete problem which can't be modeled as a logic statement, you're labeled as stupid for not knowing better. The thing is, if I knew better or already found the answer, I'd not be asking to SO in the first place.

After a couple of incidents, I left the place for the better. I can do my own research, and share my knowledge elsewhere.

Now they're training their and others models with that corpus, I'll never add a single dot to their dataset.

d0mine

3 hours ago

The major misunderstanding is that SO exists to help the question author first. It is not an IRC. The most value comes from googling a topic and getting existing answers on SO.

In other words, perhaps in your very specific case, your question is not XY problem but for the vast majority of visitors from google it won't be so. https://en.wikipedia.org/wiki/XY_problem

Personally, I always answered SO from at least two perspectives: how the question looks for someone coming from google and how the author might interpret it.

chii

4 hours ago

> Q: how do I do thing X in C?

SO does suck, but i've found that if you clarify in the question what you want, and pre-empt the Y instead of X type answers, you will get some results.

PhilipRoman

4 hours ago

I wish... Some commenters follow up with "Why do you think Y won't work for you?"

mhh__

8 hours ago

I find that this is mainly a problem in languages that attract "practical"/"best tool for the job" Philistines. Not going to name names right now but I had never really experienced this until I started using languages from a certain Washington based software company.

appendix-rock

8 hours ago

God. Yeah. I’ve always hated #IAMPolicies on Freenode :)

elicksaur

6 hours ago

On the other hand, I find it to be a fatal flaw that LLMs can’t say, “Hey you probably don’t actually want to do it that way.”

rytis

2 hours ago

I think it depends on how the question is constructed: - I want to do X, how do I do it? - I was thinking of doing X to achieve Y, wonder if that's a good idea?

Sometimes, I really want to do X, I know it may be questionable, I know the safest is "probably don't want to do it", and yet, that's not someone else's (or LLMs) business, I know exactly what I want to do, and I'm asking if anyone knows HOW, not IF.

So IMO it's not a flaw, it's a very useful feature, and I really do hope LLMs stay that way.

Ekaros

4 hours ago

I always wonder about that. Very often it seems that you need to be able to LLM that they are wrong. And then they happily correct themselves. But if you do not know that the answer is wrong how can you get correct answer?

o11c

3 hours ago

Worse: if you think the LLM is wrong and try to correct it, it will happily invent something completely different (and actually wrong this time).

milesvp

2 hours ago

This happened to me the other day. I had framed a question in the ordinal case, and since I was trying to offload thinking anyways, I forgot that my use case was rotated, and failed to apply the rotation when testing the LLM answer. I corrected it twice before it wrapped around to the same (correct) previous answer, and that’s when I noticed my error. I apologized, added the rotation piece to my question, and it happily gave me a verifiably correct answer.

stemlord

5 hours ago

Sure but most common LLMs aren't going to be patronizing and presumptuous while they say so

ElFitz

5 hours ago

It’s also quite fun when you ask niche questions that haven’t been asked or answered yet ("How do I do X with Y?"), and just get downvoted for some reason.

That’s when I stopped investing any effort into that community.

Turned out it, counter-intuitively, was impossible. And not documented anywhere.

hanniabu

7 hours ago

My questions always get closed and marked as a duplicate with a comment linking to a question that's unrelated

skywhopper

5 hours ago

I mean, those all sound like good questions. You might be a super genius, but most people who ask how to do X actually want to do Y. And if they DO want X, then those other questions about compiler and OS version really matter. The fact that you didn’t include them in your question shows you aren’t really respecting the time of the experts on the platform. If you know you are doing something unusual, then you need to provide a lot more context.

tlogan

8 hours ago

The main issue with Stack Overflow (and similar public Q&A platforms) is that many contributors do not know what they do not know, leading to inaccurate answers.

Additionally, these platforms tend to attract a fair amount of spam (self promotion etc) which can make it very hard to find high-quality responses.

cjauvin

8 hours ago

I find that LLMs are precisely that: marvelous engines to explore "what you don't know that you don't know", about anything.

milesvp

2 hours ago

I’m not sure how to take you comment, but I feel the same(?) way. I love that I can use LLMs to explore topics that I don’t know well enough to find the right language to get hits on. I used to be able to do this with google, after a few queries, and skimming to page 5 hits, I’d eventually find the one phrase that cracks open the topic. I haven’t been able to do that with google for at least 10 years. I do it regularly with LLMs today.

amenhotep

an hour ago

They are extraordinarily useful for this! "Blah blah blah high level naive description of what I want to know about, what is the term of art for this?"

Then equipped with the right term it's way easier to find reliable information about what you need.

internet101010

7 hours ago

Medium is even worse about this. It's more self-promotion than it is common help.

bee_rider

4 hours ago

QA platforms and blogging platforms both seem to have finite lifespans. QA forums (Stack Overflow, Quora, Yahoo answers) do seem to last longer, need to be moderated pretty aggressively unless they are going to turn into homework help platforms.

Blogging platforms are the worst though. Medium looked pretty OK when it first came out. But now it is just a platform for self-promotion. Substack is like 75% of the way through that transition IMO.

People who do interesting things spend most of their time doing the thing. So, non-practicing bloggers and other influencers will naturally overwhelm the people who actually have anything interesting to report.

n_ary

6 hours ago

Begin rant.

I don’t want to be that guy saying this, but 99% of the top results on google from Medium related to anything technical is literally the reworded/reframed version of the official quick start guide.

There are some very rare gems, but it is hard to find those among the above mentioned ocean of reworded quick starts disguised as “how to X”, “fixing Y”. Almost reminds me of the SEO junks when you search “how to restart iPhone” and find answers that dance around letting it die from battery drain and then charge, install this software, take it to the apple repair shop, go to settings and traverse many steps while not saying that if you are between these models use the power+volume up button trick.

End of rant.

bee_rider

4 hours ago

Somebody who just summarizes tutorials can write like 10 medium posts, in the time it takes an actual practitioner to do something legitimately interesting.

n_ary

3 hours ago

Well said. Most great articles I found on Medium are actually very old hence do not rank well.

mrkramer

6 hours ago

>The main issue with Stack Overflow (and similar public Q&A platforms) is that many contributors do not know what they do not know, leading to inaccurate answers.

The best Q&A platform would be the one where experts and scientists answer questions but sites like Wikipedia and Reddit showed that broad range of audience can also be pretty good at providing useful information and moderating it.

TacticalCoder

4 hours ago

What you mention has been serious from day one indeed.

But to me the worst issue is it's now "Dead Overflow": most answers are completely, totally and utterly outdated. And seen that they made the mistake of having the concept of an "accepted answer" (which should never have existed), it only makes the issue worse.

If it's a question about things that don't change often, like algorithms, then it's OK. But for anything "tech", technical rot is a very real thing.

To me SO has both outdated and inaccurate answers.

delichon

8 hours ago

I've gotten answers from OpenAI that were technically correct but quite horrible in the longer term. I've gotten the same kinds of answers on Stack Overflow, but there other people are eager to add the necessary feedback. I got the same feedback from an LLM but only because in that case I knew enough to ask for it.

Maybe we can get this multi-headed advantage back from LLMs by applying a team of divergent AIs to the same problem. I've had other occasions when OpenAI gave me crap that Claude corrected, and visa versa.

zmgsabst

7 hours ago

You can usually even ask the same LLM:

- do a task

- criticize your job on that task

- redo that task based on criticism

I find giving the LLM a process greatly improves the results.

iSnow

5 hours ago

That's a smart idea I didn't think of.

I've been arguing with Copilot back and forth where it gave me a half-working solution that seemed overly complicated but since I was new to the tech used, I couldn't say what exactly was wrong. After a couple of hours, I googled the background and trust my instinct and was able to simplify the code.

At that situation, where I iteratively improved the solution by telling Copilot things seem to complicated and this or that isn't working. That led the LLM to actually come back with better ideas. I kept asking myself why something like you propose isn't baked into the system.

roughly

6 hours ago

What’s fun is that you can skip step 1. The LLM will happily critique its own nonexistent output.

drawnwren

4 hours ago

The papers I've read have shown LLM critics to be quite bad at their work. If you give an LLM a few known good and bad results, I think you'll see the LLM is just as likely to make good results bad as it is to make bad results good.

blazing234

7 hours ago

How do you know the second result is correct? Or the third? Or the fourth?

phil-martin

5 hours ago

I approach it the same way as the things I build myself - testing and measuring.

Although if I’m truly honest with myself, even after many years of developing, the true cycle of me writing code is: over confidence, then shock it didn’t work 100% the first time, wondering if there is a bug in the compiler, and then reality setting in that of course the compiler is fine and I just made my 15th off-by-one error of the day :)

teeray

3 hours ago

I feel like this will be really beneficial in work environments. LLMs provide a lot of psychological safety when asking “dumb” questions that your coworkers might judge you for.

btbuildem

2 hours ago

At the same time, if I coworker comes asking me for something _strange_, my first response is to gently inquire as to the direction of their efforts instead of helping them find an answer. Often enough, this ends up going back up their "call stack" to some goofy logic branch, which we then together undo, and everyone is pleased.

herval

7 hours ago

The flipside to this is you can’t get answers to anything _recent_, since the models are trained years behind in content. My feelig is it’s getting increasingly difficult to figure out issues on the latest version of libraries & tools, as the only options are private Discords (which aren’t even googleable)

Vegenoid

6 hours ago

I think that knowledge hoarding may come back with a vengeance with the threat people feel from LLMs and offshoring.

chairmansteve

4 hours ago

Yep. For SO, the incentive was a high reputation. But now an LLM is stealing your work, what's the point?

yieldcrv

6 hours ago

The models come out fast enough

Doesn’t seem to be a great strategy to always need these things retrained, but OpenAI’s o1 has things from early 2024

Don’t ask about knowledge cutoffs anymore, that’s not how these things are trained these days. They don’t know their names or the date.

herval

5 hours ago

Not my daily experience. It’s been impossible to get relevant answers to questions on multiple languages and frameworks, no matter the model. O1 frequently generates code using deprecated libraries (and is unable to fix it with iteration).

Not to mention there will be no data for the model to learn the new stuff anyway, since places like SO will get zero responses with the new stuff for the model to crawl

yieldcrv

5 hours ago

Yes I encounter that too but for things in just the last few months with o1

It is really difficult if you need project flags and configurations to make things work, instead of just code

Github issues gets crawled, where many of these frameworks have their community

LeadB

8 hours ago

For the major programming languages, it must be a pretty esoteric question if it does not have an answer yet.

Increasingly, the free products of experts are stolen from them with the pretext that "users need to be protected". Entire open source projects are stolen by corporations and the experts are removed using the CoC wedge.

Now SO answers are stolen because the experts are not trained like hotel receptionists (while being short of time and unpaid).

I'm sure that the corporations who steal are very polite and CoC compliant, and when they fire all developers once an AGI is developed, the firing notices will be in business speak, polite, express regret and wish you all the best in your future endeavors!

stemlord

5 hours ago

Hm fair point. Rudeness is actually a sign of humanity. Like that one black mirror episode

grugagag

2 hours ago

Fair but for as long rudness is not the dominant mode.

appendix-rock

7 hours ago

I’m sorry that you ran afoul of a CoC or whatever, but this sounds like a real ‘airing dirty laundry’ tangent.

lrpanb

6 hours ago

One man's tangent is another man's big picture. It may be the case of course that some people guilty of CoC overreach are shaking in their boots right now because they went further than their corporations wanted them to go.

jneagu

5 hours ago

I am very curious to see how this is going to impact STEM education. Such a big part of an engineer's education happens informally by asking peers, teachers, and strangers questions. Different groups are more or less likely to do that consistently (e.g. https://journals.asm.org/doi/10.1128/jmbe.00100-21), and it can impact their progress. I've learned most from publicly asking "dumb" questions.

ocular-rockular

5 hours ago

It won't. If you look at advanced engineering/mathematics material online it is abysmal in quality of actually "explaining" the content. Most of the learning and understanding of intricacies happens via dialogue with professors/mentors/colleagues/etc.

That said, when that is not available, LLMs do an excellent job or rubber ducky-ing complicated topics.

jneagu

5 hours ago

To your latter point - that’s where I think most of the value of LLMs in education is. They can explain code beyond the educational content that’s already available out there. They are pretty decent at finding and explaining code errors. Someone who’s ramping up their coding skills can make a lot of progress with those two features alone.

ocular-rockular

2 hours ago

Yeah... only downside is that it requires a level of competency to recognize when the LLM is shoveling shit instead of gold.

amarcheschi

40 minutes ago

I've found chatgpt quite helpful in understanding some things that I couldn't figure out when approaching pytorch for an internship

Aurornis

6 hours ago

Many of the forums I enjoyed in the past have become heavily burdened by rules, processes, and expectations. They are frequented by people who spend hours every day reading everything and calling out any misstep.

Some of them are so overburdened that navigating all of the rules and expectations becomes a skill in itself. A single innocent misstep turns simple questions into lectures about how you’ve violated the rules.

One Slack I joined has created a Slackbot to enforce these rules. It became a game in itself for people to add new rules to the bot. Now it triggers on a large dictionary of problematic words such as “blind” (potentially offensive to people with vision impairments. Don’t bother discussing poker.). It gives a stern warning if anyone accidentally says “crazy” (offensive to those with mental health problems) or “you guys” (how dare you be so sexist).

They even created a rule that you have to make sure someone wants advice about a situation before offering it, because a group of people decided it was too presumptuous and potentially sexist (I don’t know how) for people to give advice when the other person may have only wanted to vent. This creates the weirdest situations where someone posts a question in channels named “Help and advice” and then lurkers wait to jump on anyone who offers advice if the question wasn’t explicitly phrased in a way that unequivocally requested advice.

It’s all so very tiresome to navigate. Some people appear to thrive in this environment where there are rules for everything. People who memorize and enforce all of the rules on others get to operate a tiny little power trip while opening an opportunity to lecture internet strangers all day.

It’s honestly refreshing to go from that to asking an LLM that you know isn’t going to turn your question into a lecture on social issues because you used a secretly problematic word or broke rule #73 on the ever growing list of community rules.

Ferret7446

4 hours ago

The reason those rules are created is because at some point something happened that necessitated that rule. (Not always of course, there are dictatorial mods.)

The fundamental problem is that communities/forums (in the general sense, e.g., market squares) don't scale, period. Because moderation and (transmission and error correction of) social mores don't scale.

Aurornis

4 hours ago

> The reason those rules are created is because at some point something happened that necessitated that rule. (Not always of course, there are dictatorial mods.)

Maybe initially, but in the community I’m talking about rules are introduced to prevent situations that might offend someone. For example, the rule warning against using the word “blind” was introduced by someone who thought it was a good thing to do in case a person with vision issues maybe got offended by it at some point in the future.

It’s a small group of people introducing the rules. Introducing a new rule brings a lot of celebration for the person’s thoughtfulness and earns a lot of praise and thanks for making the community safer. It’s turned into a meta-game in itself, much like how I feel when I navigate Stack Overflow

abraae

5 hours ago

> Some people appear to thrive in this environment where there are rules for everything. People who memorize and enforce all of the rules on others get to operate a tiny little power trip while opening an opportunity to lecture internet strangers all day.

Toddlers go through this sometimes around ages 2 or 3. They discover the "rules" for the first time and delight in brandishing them.

lynx23

6 hours ago

Full ACK. It has been liberating to be able to chat about a topic I always wanted to catch up on. And, even though I read a lot of apologies, at least nobody is telling me "Thats not what you actually want."

Vegenoid

6 hours ago

Yeah, Stackoverflow kinda dug their own grave by making their platform and community very unpleasant to engage with.

lynx23

5 hours ago

Well, I believe the underlying problem of platforms like StackOverflow, ticketing systems (in-house and public) and even CRMs is not really solvable. The problem is, the quality of an answer is actually not easy to determine. All the mechanisms we have are hacks, and better solutions would need more resources... which leads to skewed incentives, and ultimately to a "knwoledge" db thats actually not very good. People are incentiviszed to collect karma points, or whatever it is. But these metrics are not really resembling the quality of their work... Crowdsourcing this mechanisms via upvotes or whatever does also not really work, because quantity is not quality... As said, I believe this is a problem we can not solve.

beeboobaa3

7 hours ago

I'm sorry but the funny thing is, the only people I've ever seen complain about SO are people who don't know how to search.

wokwokwok

7 hours ago

Everyone has a pet theory about what’s wrong with SO; but here’s the truth:

Whatever they’re doing, it isn’t working.

Blame mods. Blame AI. Blame askers… whatever man.

That is a sinking ship.

If you don’t see people complain about SO, it’s because they aren’t using it, not because they’re using the search.

Pretty hard to argue at this point that the problem is with the users being too shit to use the platform.

That’s some high level BS.

grogenaut

6 hours ago

I get good answers all the time on SO or used to. My problem is that I've been down voted several times for "stupid question" and also been down voted for not knowing what I was talking about in an area I'm an expert in.

I had one question that was a bit odd and went against testing dogma that I had a friend post. He pulled it 30 minutes later as he was already down 30 votes. It was a thing that's not best practice in most cases but also in certain situations the only way to do it. Like when you're testing apis you don't control.

In some sections people also want textbook or better quality answers from random strangers on the internet.

The final part is that you at least used to have to build up a lot of karma to be able to post effectively or at all in some sections or be seen. Which is very catch 22.

So it can be both very useful and very sh*t.

fabian2k

6 hours ago

-30 votes would be extremely unusual on SO. That amount of votes even including upvotes in such a short time would be almost impossible. The only way you get that kind of massive voting is either if the question hits the "Hot Network Questions" or if an external site like HN with a high population of SO users links to it and drives lots of traffic. Questions with a negative score won't hit the hot network questions, so it seems very unlikely to me that it could be voted on that much.

o11c

3 hours ago

I don't think I've ever seen anything, no matter how bad, go below -5, and most don't go below -1. Once a question is downvoted:

- it's less likely that the question even gets shown

- it's less likely that people will even click on it

- it's less likely that people who think it's bad will bother to vote on it, since the votes are already doing the right thing

- if it's really bad, it will be marked for deletion before it gets that many downvotes anyway

SO has its problems but I don't even recognize half the things people complain about.

wizzwizz4

3 hours ago

You can get +30 from the HNQ list, but -30 is much harder, because the association bonus only gives you 101 rep, and the threshold for downvoting is 125.

wwweston

7 hours ago

I get useful info from SO all the time, so often that these days it’s rare I have to ask a question. When I do, the issue seems to be it’s likely niche enough that an answer could take days or weeks, which is too bad, but fair enough. It’s also rare I can add an answer these days but I’m glad when I can.

Ferret7446

4 hours ago

I submit that what SO is doing is working; it's just that SO is not what some people want it to be.

SO is not a pure Q&A site. It is essentially a wiki where the contents are formatted as Q&As, and asking questions is merely a method to contribute toward this wiki. This is why, e.g., duplicates are aggressively culled.

barbecue_sauce

6 hours ago

But what problem is there with it? Most of the important questions have been answered already.

wholinator2

7 hours ago

We're talking about stackoverflow right? The website is a veritable gold mine of carefully answered queries. Sure, some people are shit, but how often are you unable to get at least some progress on a question from it? I find it useful in 90-95% of queries, i find the answers useful in 99% of queries that match my question. The thing is amazing! I Google search a problem, and there's 5 threads of people with comparable issues, even if no one has my exact error, the debugging and advice around the related errors is almost always enough to get me over the hump.

Why all the hate? AI answers can suck, definitely. Stackoverflow literally holds the modern industry up. Next time you have a technical problem or error you don't understand go ahead and avoid the easy answers given on the platform and see how much better the web is without it. I don't understand, what kind questions do you have?

mvdtnz

4 hours ago

Nobody is criticising the content that is on the site. The problem is an incredibly hostile user base that will berate you if you don't ask your question in the exact right way, or if you ask a question that implies a violation of some kind of best practice (for which you don't provide context because it's irrelevant to the question).

As for the AI, it can only erode the quality of the content on SO.

timhh

9 hours ago

Stackoverflow mods and power users being arseholes reduces the use of Stackoverflow. ChatGPT is just the first viable alternative.

tomrod

9 hours ago

While not exactly the same wording, this was my also first thought.

There have been two places that I remember where arrogance of the esoterati drive two feedback cycles:

1. People leave after seeking help for an issue they believed needed the input of masters.

2. Because of gruff treatment, the masters receive complaints and indignation, triggering a backfire effect feedback loop, often under the guise of said masters not wanting to be overwhelmed by common problems and issues.

There is a few practical things that can help with this (clear guides to point to, etc.), but the missing element is kindness / non-judgmental responsiveness.

croes

9 hours ago

How can it be an alternative if it needs the data from Stackoverflow?

OgsyedIE

9 hours ago

Because consumers in every market develop models of reality (and make purchasing decisions) on the basis of their best attempts to derive accuracy from their own inevitably flawed perceptions, instead of having perfect information about every aspect of the world?

manojlds

8 hours ago

Easy to keep saying this, but SO was useful because it wasn't wild west.

weinzierl

7 hours ago

It was useful and not the wild west as long as a very small group of intelligent and highly motivated individuals moderated it. First and foremost Jeff Atwood used to do a lot of moderation himself - not unlike dang on HN.

When that stopped, the site (and to some degree its growing number of sister sites) continued on its ballistic curve, slowly but continuously descending into the abyss.

My primary take away is that we have not found a way to scale moderation. SO was doomed anyway, LLMs have just sped up that process.

timhh

5 hours ago

I disagree. It was useful because the UI was (and is!) great. Easy to use markdown input, lists of answers sorted by votes, very limited ads, etc. The gamification was also well done.

Compared to anything before it (endless phpBB forums, expertsexchange, etc.) it was just light years ahead.

Even today compared the SO UI with Quora. It's still 10x better.

waynecochran

7 hours ago

I have definitely noticed a large drop in responses on SO. I am old enough to have seen the death of these platforms. First to go was usenet when AOL and its ilk became a thing and every channel turned into spam.

miohtama

9 hours ago

It's an interesting question. The world has had 30 years to come up with a StackOverflow alternative with friendly mods. It hasn't. So the question is that has someone tried hard enough or can it be done it the first place.

I am Stack overflow mod, dealing with other mods. There is definitely unnecessary hostility there, but most of question closes and downvotes Go 90% to low quality questiond which lack proper professionalism to warrant anyone's time. It is remaining 10% that turns off people.

We can also take analogs from the death of Usenet.

jprete

8 hours ago

I think the problem isn't specific to SO. Text-based communication with strangers lacks two crucial emotional filters. Before speaking, a person anticipates the listener's reaction and adjusts what they say accordingly. After speaking, they pay attention to the listener's reaction to update their understanding for the future.

Without seeing faces, people just don't do this very well.

shagie

6 hours ago

The model of QA that Stack Overflow and its various forks follow the same approach struggle with the 90/9/1 problem ( https://en.wikipedia.org/wiki/1%25_rule ).

Q&A was designed to handle the social explosion problem and the eternal September problems by having a larger percent of the username take an interest in the community over time and continue to maintain that ideal. Things like comments and discussions being difficult is part of the design to make it so that you don't get protracted discussions that in turn needs more moderation resources.

The fraction of the people doing the curation and moderation of the site overall has dropped. The reasons for that drop are manyfold. I believe that much of it falls squarely upon Stack Overflow corporate without considering second order effects of engaging and managing the community of people who are interested in the success of the site as they envision.

Ultimately, Stack Overflow has become too successful and the people looking to it now have a different vision for what it should be that comes into conflict with both the design of the site and the vision of the core group.

While Stack Overflow can thrive with a smaller number of people asking "good" (yes, very subjective) questions it has difficulty when it strays into questions that need discussion (which its design comes into conflict with) or too many questions for the committed core group to maintain. Smaller sites can (and do) have a larger fraction of the user base committed to the goals of the site and in turn are able to provide more individual guidance - while Stack Overflow has long gone past that point.

---

Stack Overflow and its Q&A format that has been often copied works for certain sized user bases. It needs enough people to keep it interesting, but it fails to scale when too many people participate who have a different idea of what questions should be there.

There is a lack of moderation tools for the core user base to be able to manage it at scale (you will note the history of Stack Overflow has been removing and restricting moderation tools until it gets "too" bad - see also removal of 20k users helping with flag handling and the continued rescoping of close reasons).

Until someone comes up with a fundamentally different approach that is able to handle moderation at scale or sufficient barriers for new accounts (to handle the Eternal September problem), we are going to continue to see Stack Overflow clones spout and die on the vine along with a continued balkanization of knowledge in smaller areas that are able handle vision and moderation at a smaller scale.

---

Every attempt at a site I've seen since (and I include things like Lemmy in this which did a "copy reddit" and then worry (or not) about moderation) have started from a "get popular, then work on the moderation problem" which is ultimately too late to really solve the problem. The tools for moderation need to be baked into the design from the start.

rkncland

9 hours ago

ChatGPT plagiarizes the anwers of those whom you call "arseholes". How is using Stackoverflow in read-only mode different from using ChatGPT?

Except of course that reading Stackoverflow directly has better retention rates, better explanations and more in-depth discussions.

(My view is that moderators can be annoying but the issue is overblown.)

verdverm

8 hours ago

Plagiarizing means violating copyright, loosely speaking. When you, as a human, use SO, you assign your rights to the content to SO. That company is licensing the content to 3rd parties, including those who want to train their LLMs.

What I find is that the LLMs are not spitting out SO text word for word, as one would when plagiarizing. Rather, the LLM uses the context and words of my question when answering, making the response specific and cohesive (by piecing together answers from across questions).

tomrod

8 hours ago

I thought plagarizing was producing new work substantially copied from prior work, regardless who owns the copyright? I thought this because self-plagarizing exists.

verdverm

8 hours ago

Well, if we could not reproduce with changes, what others have written and we have learned, it is unlikely we could make real progress. There are many more concepts, like fair use, meaningful changes, and other legalese; as well as how people use the term "plagiarize" differently. I never heard of this "self-plagarizing" concept, it seems like something fringe that would not be enforceable other than in the court of public opinion or the classroom via grades

tomrod

7 hours ago

You're one of today's lucky 10,000! https://xkcd.com/1053/

It's a core issue in academia and other areas where the output is heavily the written word.

[0] https://en.wikipedia.org/wiki/Plagiarism#Self-plagiarism

[1] https://ori.hhs.gov/self-plagiarism

[2] https://www.aje.com/arc/self-plagiarism-how-to-define-it-and...

verdverm

4 hours ago

Reproducing sections is useful in academic publishing. I saw it while reading 100s of papers during my PhD.

(1) If you are reading your entrypoint into an area of research, or a group, it is useful context on first encounter

(2) If you are not, then you can easily skip it

(3) Citing, instead of reproducing sections like background work, means you have to go look up other papers, meaning a paper can no longer stand on its own.

Self-plagiarism is an opinion among a subset of academics, not something widely discussed or debated. Are there bad apples, sure. Is there a systemic issue, I don't think so.

hifromwork

9 hours ago

>Stackoverflow mods and power users being arseholes reduces the use of Stackoverflow

While they are certainly not perfect, they willingly spend their own spare time to help other peoples for free. I disagree with calling them arseholes.

tomrod

8 hours ago

A lot of people comment on online forums for free and are arseholes there too. Not in this thread so far that I've read, to be clear, but it certainly happens. How would you qualify the difference?

timhh

4 hours ago

The people I am referring to are not helping. At this point they are making SO worse.

The problems are two-fold:

1. Any community with volunteer moderators attracts the kind of people you don't want to be moderators. They enjoy rigidly enforcing the rules even if it makes no sense.

2. There are two ways to find questions and answer them: random new questions from the review queue, and from Google when you're searching for a problem you have. SO encourages the former, and unfortunately the vast majority of questions are awful. If you go and review questions like this you will go "downvote close, downvote close, downvote close". You're going to correctly close a load of trash questions that nobody cares about and a load of good questions you just don't understand.

I've started recording a list of questions I've asked that get idiotic downvotes or closed, so I can write a proper rant about it with concrete examples. Otherwise you get people dismissing the problem as imaginary.

These mods now hold SO hostage. SO is definitely aware of the problem but they can't instigate proper changes to fix it because the mods like this situation and they revolt if SO tries to remedy it.

romeros

9 hours ago

thats just cope. I stopped using stackoverflow because I get everything from chatpgt/claude. Just a case of having better tech.

Sure the mods were arseholes etc.. but before gpt never minded using it .