hackernews client

Sorry, GenAI is NOT going to 10x computer programming

67 pointsposted 9 months ago

(garymarcus.substack.com)

111 Comments

koliber

9 months ago

I am an experienced programmer, and just recently started using ChatGPT and Cursor to help me code. Some things it does like magic, and it's hard to say what n-fold improvement there is. I'll put the lower limit on 3x and an upper, on certain tasks, at 20x.

The project I am currently working on is took me about 16 hours to get to an MVP. A hobby project of similar size I did a few years ago took me about 80 hours. A lot of the work is NOT coding work that an LLM can help me with.

10x over everything is overstating it. However, if I can take my star developer who is already delivering significantly more value per dollar than my average guys and 3x him, that's quite a boost.

UncleOxidant

9 months ago

I was doing a short term contract over the summer porting some Verilog code to a newer FPGA. The Verilog code wasn't well documented and had been written over a period of about 20 years. There were no testbenches. Since I was making some requested changes to the design I wanted to be able to simulate it which requires creating a testbench. Verilog isn't my "native" HDL, I've done a lot more VHDL so I figured I'd see if an LLM could generate some of the boilerplate code required for a testbench (hooking up wires to the component under test, setting up clocks, etc.) I was using DeepSeek Coder for this - to my surprise it generated a pretty decent first-cut testbench, it even collected repeated code into verilog tasks and it figured out that there were some registers in the design that could be written to and read from. The testbench compiled the first time which really surprised me. There were some issues - for example, you can't write to certain registers in the design until you've set a particular bit in a particular register and it hadn't done that, but I was able to debug this pretty quickly in simulation. I figure that it saved me most of a day's work getting the testbench setup and the design simulating. My expectations were definitely exceeded.

torginus

9 months ago

I love working with Cursor/GenAI code, but there's one claim that I find repeated all the time that I found simply to be false - that junior/bad programmers put out tons of garbage code using GenAI assistants, that seniors later have to clean up.

In my experience (mostly with Claude) generated code tends to be clean, nice looking, but ranges from somewhat flawed to totally busted. It's not ugly, messy code that works, but clean, nice code that doesn't. An inexperienced programmer would probably have no hope of fixing it up.

bunderbunder

9 months ago

What I've seen is that the output is cosmetically clean code, but the design itself is messy.

Probably the most annoying example of this is emitting one-off boilerplate code when an existing library function should have been used. For example AI-generated code loves to bypass facades and bulkheads, which degrades efforts to keep the system loosely coupled and maintainable.

Just yesterday I was cleaning up some Copilot-generated code that had a bunch of call-site logic that completely duplicated the functionality of an existing method on the object it was working with. I can guarantee you that I spent a lot more time figuring out what that code was doing than its author saved by using Copilot instead of learning how the class they were interacting with works.

thiht

9 months ago

> that junior/bad programmers put out tons of garbage code

That's what code review and mentoring are for. Don't blame tools, as a senior it's your role to pull the rest the team up. And in terms of process, it's ok to reject a PR or to delay it, business deadlines are not EVERYTHING. As a senior you're also in charge of what gets merged and when. You're the one responsible for the health of the codebase and tech debt.

bunderbunder

9 months ago

I think that this actually gets to the crux of the problem.

On a high functioning team working on a brownfield project, the bottleneck typically isn't churning out code, it's quality control activities such as code review. The larger masses of lower-quality code that people are able to produce with Copilot make that problem worse. If the seniors are doing their jobs well then AI-generated code actually slows things down in more mature codebases, because the AI-generated code tends to take longer to review, which means that the rate of code getting into production slows down even as the rate at which developers can write it in the first place increases.

I would bet a lot of money that the people who claim huge productivity improvements with tools like Copilot are mostly self-reporting their own experience with solo and greenfield projects. That's certainly strongly implied by the bulk of positive anecdotes that include sufficient detail to be able to hazard a guess. It's also consistent with the studies we're seeing.

koliber

9 months ago

Yes!

I was explaining it to my wife yesterday. The code is clean, but occasionally the changes are in a completely wrong file. Or sometimes it changes something that is completely irrelevant.

It's as if I had a decent programmer who occasionally loses his mind and does something non-sensical.

This is dangerous because it's easy to get lulled into a false sense of security. Most of the code looks good, but you have to be very vigilant.

It's the same with when LLMs generate writing. It sounds good, is grammatically correct, but sometimes, it's plain false.

bunderbunder

9 months ago

You've got to consider second order effects, though.

Where I'm working the star developers don't have a great culture of being conscientious about writing code their non-star colleagues can read and modify without too much trouble. (In fairness, I've worked exactly one place where that wasn't the case, and that was only because avoiding this problem was a personal vendetta for the CTO.)

We gave them Copilot and it only compounded the problem. They started churning out code that's damn near unreadable at a furious pace. And that pushed the productivity of the people who are responsible for supporting and maintaining their code down toward zero.

If you're only looking at how quickly people clear tickets, it looks like Copilot was an amazing productivity boost. But that's not a full accounting of the situation. A perspective that values teamwork and sustainability over army-of-one heroism and instant gratification can easily lead to the opposite conclusion.

Ancapistani

9 months ago

> Where I'm working the star developers don't have a great culture of being conscientious about writing code their non-star colleagues can read and modify without too much trouble.

I'd argue that you're "starring" the wrong people, then.

I've known and worked with developers that are mind-boggling productive - meaning they solve problems very quickly and consistently. Usually, that's at the expense of maintainability.

I've also known and worked with developers whose raw output is much lower, but significantly higher quality. It may take them a week to solve the problem the other group can get out the door in half a day - but there are solid tests, great documentation, they effectively communicate the changes to the group, and six months from now when something breaks anyone can go in and quickly fix it because the problem is isolated and easy to adapt.

I try to be part of the second group; I'd rather get six story points done in a sprint and have them done _right_ than knock out 15 points and kicking the can down the road.

I've known exactly one developer who was in both groups - i.e., they are incredibly productive and produce top-tier output at the same time. He was 19 when I met him which would make him 27 or so now. At the time my assumption was that his pace wasn't sustainable over the long term. I should look him up again and see how he's doing these days...

DGCA

9 months ago

IMO, if your star developers aren’t writing readable, maintainable code (at least most of the time, some things are inherently complex), I don’t think it’s right to call them star developers.

FWIW, the problem you’re talking about isn’t one I’ve encountered very often at most of the companies I’ve worked at, and I’ve worked at small startups and large enterprises. I genuinely wonder why that is.

koliber

9 months ago

The star programmer was not a mature programmer. They were proficient and smart. They were incapable of working with a team, because it was difficult to collaborate with them because their code was inaccessible.

A principal engineer is not someone who generates the most working code. It's someone who moves the net productivity and product impact of the whole team forward.

mattchamb

9 months ago

Ugh I am dealing with an amazingly productive platform team who churn out so much stuff that the product teams just can’t keep on top of all the changes to tools and tech.

That one team is super productive at the cost of everyone else grinding to a halt.

bunderbunder

9 months ago

Some of the coolest demos from my Lean Six Sigma training were all about demonstrating that oftentimes the easiest way to increase the end-to-end throughput of your value chain is to find the team member with the greatest personal productivity and force them to slow down.

You don't necessarily even have to do anything more than that. Just impose the rate limit on them and they'll automatically and unconsciously stop doing all sorts of little things - mostly various flavors of corner cutting - that make life harder for everyone around them.

mattchamb

9 months ago

Haha that’s a great idea and perspective on tackling it.

janalsncm

9 months ago

Internet of Bugs did a decent side-by-side of four of them.

https://youtu.be/-ONQvxqLXqE

They can get you 95% of the way there, the question is whether fixing the last 5% will take more time than doing all of it yourself. I presume this depends on your familiarity with the domain.

fwsgonzo

9 months ago

I think I agree. It's maybe 2x for me after the project has been solidly established. However, it's very useful to have the AI during the beginning, when you're writing a lot of boilerplate or integrating with APIs that you don't know super well. The AI can surprise you and connect some dots.

danieldk

9 months ago

The lower bound is certainly lower than 3x. I do very domain-specific work (in Rust/Python) and usually the output is useless and ignoring the garbage suggestions can have a higher mental load than not using it. It does work well for boilerplate though and since boilerplate is the most annoying code to write, I'll tolerate it.

I agree that the upper bound is pretty high, with more straightforward MVPs, it can generate large swabs of code for you from a description. Also for little DSLs that I haven't memorized, generating a pyproject.toml from a bunch of constraints is so much nicer/quicker than reading the docs again every few months.

handzhiev

9 months ago

And it's not just the speed. The mental energy I save from having AI do some work vs. me doing it is allowing me to feel better, have mental energy for different things, be more productive, or just have some rest.

user

9 months ago

[deleted]

ccvannorman

9 months ago

Just because this article isn't well formed or sourced doesnt make its claim incorrect.

I program daily and I use AI for it daily.

For short and simple programs AI can do it 100x faster, but it is fundamentally limited by its context size. As a program grows in complexity AI is not currently able to create or modify the codebase with success (around 2000 lines is where I found it has a barrier). I suspect it's due to exponential complexity associated with input size.

Show me an AI that can do this for a 10,000 lines complex program and I'll eat my own shorts

CapeTheory

9 months ago

Doesn't even take that much. Today I did some basic Terraform with the help of GenAI - it can certainly print out the fundamentals (VPC, subnets) faster than I can type them myself, but the wheels came off quickly. It hallucinated 2 or 3 things (some non-existent provider features, invalid TF syntax, etc).

When you take into account prompt writing time and the effort of fixing its mistakes, I would have been better off doing the whole thing by hand. To make matters worse, I find that my mental model of what was created is nowhere near as strong as it would have been if I did things myself - meaning that when I go back to the code tomorrow, I might be better off just starting from scratch.

leblancfg

9 months ago

Here's a thought experiment. Think back on how that statement would have sounded like to past-you, 3 years ago. You would probably have dismissed it as bullshit, right? We've gone a long way since then. Both in terms of better, faster and cheaper models, but also how they're being intertwined with developer tooling.

Now imagine 3 years from now.

cj

9 months ago

You could have said the same for crypto/blockchain 3-4 years ago (or whenever it was at peak hype).

Eventually we realized what is and isn't possible or practical to use blockchain for. It didn't really live up to all the original hype years ago, but it's still a good technology to have around.

It's possible LLMs could follow a similar pattern, but who knows.

__loam

9 months ago

What good thing has blockchain ever done that isn't facilitating crime or tax evasion?

cj

9 months ago

It created a speculative asset that some people are passionate about.

However, if you saw the homepage of HN during blockchain peak hype, being a speculative asset / digital currency was seen almost as a side effect of the underlying technology, but it turns out that’s pretty much all it turned out to be useful for.

falcolas

9 months ago

As you inadvertently pointed out, AI improvements are not linear. They depend on new discoveries more than they do iteration. We could be in either out of jobs or lamenting the stagnation of AI (again).

leblancfg

9 months ago

After an innovation phase there is an implementation phase. Depending on the usefulness of the innovation, the integration with existing systems takes time. It is calculated in years, tens of years. Think back on the 80-90s, where it took years to integrate PCs into offices and workspaces.

From your comment, is sounds like you think that the implementation phase of LLMs is already over? And if so, how do you come to this conclusion?

falcolas

9 months ago

It's not as if we have no idea how to make use of AI in programming. We've been working on AI in one form or another since the 70's, and integrated them with our programming workflow almost as long (more recently in the form of autocomplete using natural language processing and machine learning models). It's already completely integrated into our IDEs, often with options to collate the output of multiple LLMs.

What further implementations of integrating AI and programming workflows have LLMs shown to be missing?

skybrian

9 months ago

You can imagine all sorts of things, and then something else might happen. You can’t rely on “proof by imagination” or “proof by lack of imagination.”

We shouldn’t be highly confident in any claims about where AI will be in three years, because it depends on how successful the research is. Figuring out how to apply the technology to create successful products takes time, too.

tester756

9 months ago

Same thing that can be said about autonomous cars in 2014?

Not everything will grow exponentially forever

ryanackley

9 months ago

GPT-4 has been out for 1.5 years and I haven't seen much improvement in code quality across LLM's. It's still one of the best.

danieldk

9 months ago

Or you are extrapolating from the exponential growth phase in a sigmoid curve. Hard to say.

denismi

9 months ago

Ten years ago when Siri/Google/Alexa were launching, I really wouldn't have expected that 2024 voice assistants would be mere egg timers, and frustrating ones at that - requiring considered phrasing and regular repeating/cancelling/yelling to trick it into doing what you want.

A 10x near future isn't inconceivable, but neither is one where we look back and laugh at how hyped we got at that early-20s version of language models.

hyperG

9 months ago

It is a great point.

It also might be that the language everyone uses 20 years from now that gives a 50X from today is just being worked on right now or won't come along for another 5 years.

The way people who would have thought that humans could never fly were not completely wrong before the airplane. After the airplane though, we are really talking about two different versions of a "human that can fly".

rlt

9 months ago

In my very uninformed opinion, all we need is more clever indexing, prompting, and agents that can iteratively load parts of the codebase into their context and make modifications.

Real engineers aren’t expected to hold 10,000 lines of exact code in their head, they know the overall structure and general patterns used throughout the codebase, then look up the parts they need to make a modification.

danieldk

9 months ago

I suspect it's due to exponential complexity associated with input size.

I am curious how you get to exponential complexity. The time complexity of a normal transformer is quadratic.

Or do you mean that the complexity of dealing with a codebase grows exponentially with the length?

ccvannorman

9 months ago

Generally speaking, complexity grows exponentially with input size.

A program of length 100 lines has the potential to be 1000x as complex as a program with length 10 lines.

Consider the amount of information that could be stored in 4 bits vs 8 bits; 2^4 vs 2^8.

As the potential to be complex grows, so does current AI's ability to effectively write and modify code at that scale.

abdullin

9 months ago

I would recommend giving a try to o1-preview in coding tasks like this one.

It is one level above Claude 3.5 Sonnet, which currently is the most popular tool among my peers.

juliendorra

9 months ago

For comparison VisiCalc and other early spreadsheets like Lotus 123 and Multiplan were 80x multipliers for their users:

“I would work for twenty hours,” Jackson said. “With a spreadsheet, it takes me 15 minutes.”

Not sure any other computing tool ever beat that since?

Source: Steven Levy reporting, Harper’s Bazaar 1984, reprinted in Wired: https://www.wired.com/2014/10/a-spreadsheet-way-of-knowledge...

carterparks

9 months ago

It doesn't always 10x my development but on certain problems I can work 30x faster. In due time, this is only going to accelerate more as tooling becomes more closely integrated into dev workflows.

norir

9 months ago

The bummer of ai hype is that I think there is so much we can do to improve programming with deterministic tooling. We have seen a remarkable improvement in terms of real time feedback directly in your editing context through IDEs and editor plugins.

We still don't yet have the next programming language for the ai age. I believe the biggest gains will not come from bigger models but smarter tools that can use stochastic tools like LLMs to explore the solution space and quickly verify the solution according to well defined constraints that can be expressed in the language.

When I hear about some of the massive productivity gains people ascribe to ai, I also wonder where the amazement for "rails new" has gone. We already have tools that can 100x your productivity in specific areas without also hallucinating random nonsense (though Rails is maybe not the absolute best counterexample here).

throw4847285

9 months ago

The reason GenAI has been so helpful for developers is that devs spend most of their time doing grunt work. That is, basic pattern matching: copy pasting chunks of code from one place to another and then modifying those chunks enough such that a couple code reviews will catch the things they forgot to modify. This is a miserable existence, and GenAI can alleviate it, but only because it was grunt work in the first place.

kolme

9 months ago

> copy pasting chunks of code from one place to another and then modifying those chunks

That's not how you program. I've never seen anyone programing in this way unless they were just starting.

Edit: if you find yourself copying chunks and modifying them, you probably need to create an abstraction. Don't mean to be rude, just honestly that is my personal experience.

marssaxman

9 months ago

I am so sorry that this is the experience you have. It sounds like such a drag! Of course all grunt work should be automated; that is rather the point of the industry.

I suppose a reason I feel little interest in applying AI to coding may be that my experience is nothing like you describe.

k__

9 months ago

Yes.

I don't have the impression GenAI can replace me, and I'm not even a 10x Dev.

Yet, it makes much of the daily task more bearable.

ayhanfuat

9 months ago

It must be really hard to be the person who said "Deep learning is hitting a wall" in 2022. This is just doubling down.

Fripplebubby

9 months ago

Do people generally really believe that GenAI would 10x programmer productivity? I find that surprising, that's not what I've read here on HN, by-and-large. 20-30% seems much more like what people are actually experiencing today. Is that controversial?

mewpmewp2

9 months ago

It's very dependent on circumstances and what kind of environment someone is in.

At certain very specific tasks it's going to have that 10x+ and more.

Also we've made 1000x+ gains from the times we had punch cards etc.

It will depend on the kind of projects you are working on, and the type of environment you are in. Is it a very large company or is it a side project.

In right cases it's over 10x+, in worse cases it's 10%+ maybe.

It may speak to however a type of movement that might happen is that there will be newer companies that will make sure they unleash a culture where AI first engineering is promoted, which might make it scale further than usual enterprise work currently can do. If a company can start small and lean they will be in a position to have a huge multiplier from AI. It's harder to get to that setup from a pre-existing large corporation stand point.

manofmanysmiles

9 months ago

I think this guy has not worked with enough people who aren't that good at programming. GenAI lets me spit out reams of what to me is boilerplate, but to someone more junior might legitimately take weeks. It's a tool that right now empowers people who already are extremely effective. It is not a substitute for deep knowledge, wisdom and experience.

Ifkaluva

9 months ago

Frankly, he hasn't worked with any programmers at all. Gary Marcus is a retired psychology professor.

senko

9 months ago

Gary takes credible facts there (modest gains, ai is not replacement for clear thinking), altough attacking the "10x" claim is a cheap shot because the claim was always nebulous.

However, in light of Gary's other AI related writings[0], it is clear he's not saying "people, be clear headed about the promises and limits of this new technology", he's saying "genai is shit". Yes, I am attacking the character/motive instead of the argument, because there's plenty of proof about the motive.

So I am kind of agreeing (facts as laid out) and vehemently disagreeing (the underlying premise his readers will get from this) with the guy at the same time.

[0]: Here are a few headlines I grabbed from his blog:

- How much should OpenAI’s abandoned promises be worth?

- Five reasons why OpenAI’s $150B financing might fall apart

- OpenAI’s slow-motion train wreck

- Why California’s AI safety bill should (still) be signed into law - and why that won’t be nearly enough

- Why the collapse of the Generative AI bubble may be imminent

eitally

9 months ago

I think too many SWEs making comments like Gary has are discounting just how many software engineers are employed outside of tech, mostly working on either 1) workflow, 2) CRUD, or 3) reporting/analytics tools inside enterprise IT organizations. Between SaaS & GenAI, a very large percentage of those roles are starting to dry up.

resters

9 months ago

I think it's already surpassed 10x, and is closer to 25x or 30x. You just have to know what to ask it to do.

mewpmewp2

9 months ago

It's very circumstance specific. In my side projects, where I work alone and have the perfect setup for AI tools, for sure it's magnitudes in difference, but at actual work I'm constantly blocked by other teams, bureaucracy, legacy limitations, decisions, so my hands are tied with or without AI. Short, one-off projects independent of others also massive difference.

freetanga

9 months ago

I think it’s peak at whatever X, but when you weight all activities in a software delivery project, the averaged impact is diluted, which might be in line with the author and studies presented (I have not read them yet)

There is a significant chunk of time spent on how to address your problem, how to break it down, thinking how it will perform in real life concurrently, etc. Sitting down and coding like you are on caffeine overdrive is not the full range of “programming” (at least in my neck of the woods).

I see a lot of people mentioning 20x in here. Would they fire 20 devs tomorrow and eat their own dogfood?

nonameiguess

9 months ago

Statements like this make no sense to me. I get that your personal experience may be you can type 30x as much code per unit of time, but where is the 30x better software? Where even is the 30x more software? I've never been part of any project I can think where how fast you can produce code was even a meaningful bottleneck to delivery compared to all the other impediments. Granted, I've largely worked in fairly high-security environments, but still.

resters

9 months ago

For types of code that LLMs are "good at", the results are quite good if the LLM is told pretty much how to solve the problem. Generally they can map English language text into working code modules, can write tests for the modules, etc. You still have to plan out the approach and keep track of how it's supposed to be working or else they can be quite error prone and arbitrarily remove needed code or randomly forget important details. But once those constraints are understood it can often do quite well.

At this point if we get larger context windows and LLMs become able to break down problems to include fetching relevant code into context when needed, and perhaps with the addition of some oversight training about not removing functionality, the qualty will get much, much better.

But already they are a massive time saver and they are good at some of the more boring aspects of writing code, leaving the engineer to do more systems thinking and problem solving.

aerhardt

9 months ago

It's doing the code of 30 devs equal to your level? What kinds of things do you program?

maest

9 months ago

It's more like: are there tasks that would take me 30h but now only take me 1h?

Maybe, but they are rare. Still a welcome boost, though.

vundercind

9 months ago

I should probably get around to actually using one of these things.

gadflyinyoureye

9 months ago

I’d love to see people going through a real world walk through using such things. Some people who do cookie cutter common tasks find that an LLM can produce their whole code base. But ask it to make a change and what happens?

abdullin

9 months ago

This is an example of my prompt to o1 a few days ago. First request produced refactoring suggestions. Second request (yes, I like results, implement them) produced multiple files that have just worked.

—- Take a look at this code from my multi-mode (a la vim or old terminal apps) block-based content editor.

I want to build on the keyboard interface and introduce a simple way to have simple commands with small popup. E.g. after doing "A" in "view" mode, show user a popup that expects H,T,I, or V.

Or, after pressing "P" in view mode - show a small popup that has an text input waiting for the permission role for the page.

Don't implement the changes, just think through how to extend existing code to make logic like that simple.

Remember, I like simple code, I don't like spaghetti code and many small classes/files.

falcolas

9 months ago

IME: I will either get a repeat of something already written (this happens all the time with java getter/setter boilerplate if the class isn't the one I'm currently working in), or I get some junior level attempt at what I ask it to do. The second case is the best case, and it still requires a code review with a fine toothed comb. For example, it loves to suggest plausible looking but non-existant methods on popular libraries.

It's one of those things where a language aware autocomplete will often save me more time, because it doesn't try and second guess me.

resters

9 months ago

sometimes one must give it current docs to avoid that. Ironically gpt-4 always tries to use the old chat completions api when writing code to consume the open-ai api.

JHonaker

9 months ago

I've been looking for such a walkthrough for months. I can't find anyone actually showing them working and building something. There are plenty of, "look at this code this thing spit out that successfully builds," but I've not been able to find anything I would consider truly showing someone building something novel.

If anyone has a YT video or a detailed article, please show me!

resters

9 months ago

give me an example of the kind of thing you are curious about how to build in this way, and I will give you a step by step.

user

9 months ago

[deleted]

user

9 months ago

[deleted]

dgoodell

9 months ago

So far it has definitely saved me a lot of time googling api docs and stackoverflow for me.

Anything that requires applying things in novel ways that doesn’t have lots of examples out there already seems to be completely beyond it.

Also, it often comes up with very unoptimal and inefficient solutions even though it seems to be completely aware of better solutions when prodded.

So basically it’s a fully competent programmer lol.

throwaway918299

9 months ago

My average experience reviewing code written by “average developers” that start using copilot is that it ends up 0.1x them.

We are going to be so screwed in 10 years.

hackermatic

9 months ago

I hope commenters will dig into the author's citations' data, in line with HN's discussion guidelines, instead of just expressing a negative opinion about the thrust of the article. The quantifiable impact of genAI on code productivity is an important research question, and very much an open question subject to bias from all the players in the space -- especially once you factor in quality, maintainability, and bugs or revisions over time.

The GitClear whitepaper that Marcus cites tries to account for some of these factors, but they're biased by selling their own code quality tools. Likewise, GitHub's whitepapers (and subsequent marketing) tend to study the perception of productivity and quality by developers, and other fuzzy factors like the suggestion acceptance rate -- but not bug rate or the durability of accepted suggestions over time. (I think perceived productivity and enjoyment of one's job are also important values, but they're not what these products are being sold on.)

Jimmc414

9 months ago

"Gary Marcus has been coding since he was 8 years old, and was very proud when his undergrad professor said he coded an order of magnitude faster than his peers. Just as Chollet might have predicted, most of the advantage came from clarity — about task, algorithms and data structures."

I'm trying to understand what sort of person would end with a 3rd person anecdote like this.

hiddencost

9 months ago

Why do people keep posting this stuff? He's been writing variations on the same essay for decades and he's always wrong.

baanist

9 months ago

What is he wrong about?

krallistic

9 months ago

"Deep learning is hitting a wall now with just scaling"

"Deep learning is only good for perception" (with language one of the areas where its not good)

baanist

9 months ago

I don't have any subscriptions to the latest models but what improvements have you noticed in scaling and language understanding? Last I checked people were still discussing "9.9 > 9.1" and "How many 'r's are in 'strawberry'".

hyperG

9 months ago

I have never read anything by him that seemed correct.

He really does seem like quite a good contrarian indicator if anything.

baanist

9 months ago

He was right that scaling would not achieve abstract reasoning and he's been right so far on basically every new hyped development. The closed research labs like OpenAI are hoping to reduce all the gaps in their models by constantly patching out of distribution data sets but this clearly can not achieve any sort of general intelligence unless they somehow manage to obsolete themselves and the entire company by automating the out of distribution patching itself which they now perform by burning lots of cash and energy.

There are people who need to write a lot of emails so for those people I'm sure OpenAI will continue to deliver some kind of value but everyone else will still have to continue thinking for themselves regardless of what Sam Altman keeps promising.

addisonj

9 months ago

I want to make clear that I agree with the author that the current way in which AI is being used to make a 10x improvement in efficiency is NOT going to work, and it is for exactly reason stated:

> 10x-ing requires deep conceptual understanding – exactly what GenAI lacks. Writing lines of code without that understanding can only help so much.

IMHO, the real opportunity with AI, is to get developers less critical in being the only ones who translate conceptual understanding of the business problem into code and get developers more focused on the domain they should be experts in which, imho, is the large scale structure of the system and how that maps to actual computes.

What I mean by that... developers, are, on average, not good at having deep conceptual understanding of the business domain... and maybe have that understanding in the computational domain. All of the 10x developers I have ever met are mostly that because they do know both the business domain (or can learn it really quick) and the computational domain so they can quickly deliver value because they understand the problem end-to-end.

My internal model for thinking about this is we need developers thinking about building the right "intermediate representation" of a problem space and being able to manage the system at large, but we need domain experts, likely supported by AI assistants, in helping to use that IR to express the actual business problem. That is a super loose explanation... but if you have been in the industry awhile, you probably have felt the insanity of how long it can take to ship a fairly small feature because of a huge disconnect between the broader structure of the system (with microservices, async job systems, data pipelines, etc, which we can think of as the low-level representation of the sytem at large) and the high-level requirement of needing to store some data and process it according to some business specific rules.

I have no idea if we actually can get there as an industry... it is more likely we just have more terrible code... but maybe not

tobyhinloopen

9 months ago

The real reason ChatGPT saves time is because it is faster to provide inaccurate answers than Google.

bitdeep

9 months ago

Tools like aider or cursor composer does not help for complex code as they destroy your mental code model of the solution you working on.

This tools help a bit for initial mocks, but even that, I don't like as they create code that I don't know and I need to review it all to know where things is.

When you are doing complex software, you need to build a good mental code model to know where things is, especialy when you need to start to debug issues, not knowing where things is is a mess and very annoying.

This days, I almost don't use this tools anymore, I just prefer basic line auto-completion.

m463

9 months ago

My first job at college had schedules with high-level and detailed design, then coding, then integration and test.

Coding was only 3-5% of the time.

Also, I remember learning to race. I tried to learn to go faster through the turns.

But it turns out the way to be fast was to go a little faster on the straightaway and long turns. Trying to go faster through the tight turns doesn't help much.

If you go 1 mph faster for 10 feet in the slowest turn, it doesn't make as much difference as going .1 mph faster in a long 500 foot turn.

so...

Are we trying to haphazardly optimize for the small amount of time we code, when we should spend it elsewhere?

rekttrader

9 months ago

As a CTO of a new startup, I am not hiring contractors or employees to build software. This may not be 10x, but when the time comes and I need scale I’m likely to hire what will be my smallest team yet.

One of the requirements for hiring is that they be proficient with copilot, cursor, or have rolled their own llama based code assistant.

It’s like going from using manual hand tools to power drills and nail guns. Those that doubt That AI will change all of technology jobs and work in the industry are gonna find themselves doing something else.

yodon

9 months ago

> One of the requirements for hiring is that they be proficient with copilot, cursor, or HAVE ROLLED THEIR OWN LLAMA BASED CODE ASSISTANT. [enphasis added]

Isn't that a bit like "I won't hire a dev unless they've written a C++ compiler"? Very dev macho, but not necessarily the best criteria for someone working on a distributed financial transactional system.

nateless

9 months ago

Completely agree.

Tried spinning up a side project with just Cursor and GPT/Claude models, and we're definitely not at 'AI can build apps from just a prompt' yet. It couldn’t even create a dropdown filter like Linear’s, and the code was buggy, inefficient, and far from the full functionality I asked for. Maybe it's the niche language (Elixir), but we’re not there yet.

hluska

9 months ago

I can only speak about what it’s done to productivity. A Gen AI is a very ambitious intern who will do anything I ask. I can keep making it refine things, or take what it has and finish it myself. There are no hurt feelings and I don’t have to worry about keeping my Gen AI motivated.

It certainly doesn’t write production quality code, but that entire first paragraph was science fiction a short time ago.

ratedgene

9 months ago

I believe it's changed everything forever. I'm not even on the hype train. It's difficult for humans to estimate that impact, so it's easier to just deny. It's a fundamental shift on so many levels that we haven't even begun to scratch the surface and import not just to engineering but all other facets of life. The future is accelerating.

user

9 months ago

[deleted]

rkunal

9 months ago

I will believe when GenAI becomes 10x, we would be noticing a larger number of emacs packages. Or at-least few attempts to rewrite old famous softwares like Calibre. I am sure that it is being used at a large scale in web apps and cloud tech. Not aware of the same in other domains.

May be in the future, everything will become a web app because genAI became so good at it.

janalsncm

9 months ago

If you want to know how valuable GenAI is for software engineering, you don’t ask AI hype people on Twitter. You also don’t ask Gary Marcus. You ask people trying to complete software engineering tasks.

I guess part of the reason we can’t agree on whether they are useful is that their usefulness depends on how familiar you are with the programming task.

codingwagie

9 months ago

I'll never understand people that cannot project current progress into what we may see with further progress.

KaiserPro

9 months ago

I've never understood why people are projecting infinite and uniform growth of x.

The path isnt linear.

I'm lucky to have some of the best AI integrations around, most of the time they suggest boneheadded bollocks that blocks out the proper actual autocomplete.

It is good at suggesting code hints outside of the IDE.

dom96

9 months ago

I'll never understand people that cannot imagine progress hitting a wall.

Where are our flying cars?

grumpopotamus

9 months ago

I was just thinking about how common this is, even for experts in a field. The line of thinking goes something like: 1. Observe the current state of the art. 2. Make a claim implying that we will NEVER advance beyond the current state of the art (often despite accelerating progress in the field).

lawn

9 months ago

Didn't xkcd teach us the dangers of extrapolation years ago?

https://xkcd.com/605/

richard___

9 months ago

This guy doesnt code and is breaking his back looking for data to support his biased hypothesis

aerhardt

9 months ago

I code, I use GenAI every day, all day and it is not even close to 10x my productivity. Maybe 1,3x?

tobyhinloopen

9 months ago

For me 1,2 of that 1,3 is because it’s the fastest way to get somewhat working autocomplete and inline documentation hah

xnx

9 months ago

There's a big difference between "AI isn't 10x-ing computer programming" and "AI will not 10x computer programming". Am I missing the part of the article where some evidence is provided that the situation won't improve?

zarzavat

9 months ago

There seem to be many people who are at pains to tell us what generative AI won't do in the future.

Any such prediction seems extremely likely to be false without a time bound, or an argument to physical impossibility.

munchausen42

9 months ago

Funny to see how being anti-GenAI and anti-LLM is now the new en vogue on HN. Can't wait till that dies off as well.

kylehotchkiss

9 months ago

People have been through a lot of hype cycles and the grandiose claims of what AI/LLMs could be / "superintellegence" etc are a complete distraction from a pretty impressive tech accomplishment. Too few people wonder if the wins we got are worth it even if we hit a wall trying to achieve "AGI". And the real world impacts with electricity and billionaires running around with no guardrails are starting to slowly come to light. It sure looks a lot like the last hype cycle where we were promised a global currency without government intervention... lol. Do you remember the part where people took out loans to buy cartoon apes?

(I personally/philosophically don't believe AGI is a natural next step for LLMs as I don't believe the English language alone which training is so heavy on encapsulates all of human ability, rather it's very honed in on English speaking countries/cultures - I also don't believe humans are very capable of creating derivative products with capabilities greater than their own - we can barely make progress on what really causes mental illness[1] - how can we claim to understand our minds so well we can replicate their functionality?)

[1]: https://www.science.org/content/blog-post/new-mode-schizophr...

yawnxyz

9 months ago

Keep telling other devs to stop using AI for code!!! I don't want to lose my edge lol

throwawa14223

9 months ago

I'd be shocked if it wasn't a negative at the end of the day.

sva_

9 months ago

This guy really tries too hard to make himself relevant.

baanist

9 months ago

I think he does a pretty good job of it. I know who he is and I read what he has to say on all the latest hype trends in AI.

dmitrygr

9 months ago

For low level work (kernels, drivers, C, assembly), things like ChatGPT and Cursor are a 0.1X multiplier. They suggest idiotic things both design-wise and implementation-wise. In this world, I’ll take my cat’s help over theirs. Maybe it’ll improve, but I won’t hold my breath.

mahmoudimus

9 months ago

That's sort of ironic. One of the biggest value adds I've seen to my hobby work is that LLMs are REALLY good at understanding disassembly and can even do things like symbolically execute a few lines for me. I've used it to deobfuscate several malware protections and I can imagine it will get better.

However, I have never tried it on kernels, drivers, etc. Just the inverse of that :)

baanist

9 months ago

These algorithms are not capable of symbolic reasoning and abstract interpretation. The most obvious demonstration of this is that no algorithm on the market currently can solve sudoku puzzles even though they have spent billions of dollars "training" them on logic and reasoning.

rogerclark

9 months ago

It already has, for better or worse. Why does anyone still take this guy seriously? It's one thing to be skeptical that AI is going to make the world a better place... it's another thing to be skeptical that it exists and actually does things.

geenkeuse

9 months ago

Looks like you guys are coming around.For the longest time it was the same "AI will never ever do what we do" song.

Now you are all mostly singing it's praises.

Good. You will be left standing when the stubborn are forced to call it quits.

What is so wrong with having a computer do amazing things for you simply by asking it?

That was the dream and the mission, wasn't it?

Unless "obfuscation is the enemy"

Most people don't want the IT guy to get into the details of how they did what they did. They just want to get on with it. Goes for the guy replacing your hard drive. Goes for the guy writing super complicated programs. IT is a commodity. A grudge purchase for most companies. But you know that, deep down. Don't you?

The bar has been lowered and raised at the same time. Amazing.