bananaquant
4 hours ago
This to me reads like a poignant commentary on the catastrophic loss of human agency, with the actual commit being highly revealing [0].
Author wants to hide a horizontal scrollbar. Any junior frontend dev worth their salt will be asking right away "where do I stick `overflow-x: hidden;`?" A complete solution will then require hitting "Inspect element" in the browser to find the CSS class and running (rip)grep to find where it is in code, to then add a single line to.
An actual proactive programmer might start asking more pointed questions like what content does an empty textbox have that it overflows? And why do I need to insert this workaround that treats the symptom and not the root cause in two different places? Isn't it better to style `textarea` once? Etc, etc.
[0] https://github.com/datasette/datasette-agent/commit/a75a8b72...
biztos
3 hours ago
They might also ask why a bunch of static CSS inside a bunch of JavaScript is hiding inside __init__.py[0] - hopefully before trying to fix some detail of the CSS.
(I'm surprised to see it actually, since my own use of Claude has mostly yielded well-structured code. But I'm not doing proper vibe-coding, more like friendly Socratic arguing with another engineer who happens to be a robot.)
[0] https://github.com/datasette/datasette-agent/blob/main/datas...
simonw
44 minutes ago
Thanks for the prod, I've extracted that script out into a separate static file: https://github.com/datasette/datasette-agent/commit/fa505b82...
(It was in Python because there were a couple of URLs that needed to be dynamically constructed by the server, but those are output as a small window.datasetteAgentJumpConfig object instead now.)
piker
4 hours ago
This is exactly right. By offloading this trivial task to the LLM, Simon has abandoned the opportunity to evaluate the abstraction with additional information and improve it. Instead, we let the agent spend $12 and make the fix while learning nothing.
simonw
2 hours ago
Things I learned from this:
- Fable will do a whole lot more than you might expect in order to verify a fix. I learned that it's "relentlessly proactive". That's a good title for a blog entry!
- You can take screenshots of a window in macOS using the "screencapture" CLI command, but you'll need the integer window ID first.
- That windowID is accessible via "Quartz.CGWindowListCopyWindowInfo(Quartz.kCGWindowListOptionOnScreenOnly, Quartz.kCGNullWindowID)" using the pyobjc-framework-Quartz library, which installs cleanly via "uv run".
- A neat trick for simulating keyboard shortcuts is to run document.dispatchEvent(new KeyboardEvent("keydown", {key: "/", bubbles: true})); after the page loads.
- You don't need Flask or Starlette to run a CORS-enabled localhost server for capturing JSON from another window - 19 lines of code against the Python standard library http.server package works just fine.
- getComputedStyle(document.querySelector("navigation-search").shadowRoot.querySelector("textarea")) works to read dimensions from inside a Web Component's shadow DOM.
- defaults write com.google.chrome.for.testing AppleShowScrollBars Always
- Claude Fable knows how to apply all of the above. It's always interesting to pick up hints of what a model can and cannot do.
I'm always confused at how many people equate using a coding agent to solve a problem with "learning nothing". If you pay attention to what it's doing you can learn so much!
piker
37 minutes ago
Sorry that wasn't a criticism of you!
I completely see how it was misread that way. I would edit it now if I could.
I was using you more as an example of a hypothetical programmer using it in this way. If the goal is to create a maintainable product, this isn't a great approach. If the goal is to learn about the model and its behaviors itself, of course this is a fantastic way to experiment. Yes, you might have learned a lot of tricks as a side effect, but avoiding the pain of thinking about, finding and hiding the thing may mask a better abstraction that reduces complexity and allows the project to move forward faster.
almostdeadguy
16 minutes ago
It's like saying you can learn so much about math from using SymPy to solve equations. Yes, you probably can. If you pay close attention to what is happening and can integrate the techniques being used into your knowledge.
But your learnings here are what, a handful of hacks? For most people it's like being shown the chain rule (which frankly, is more general than any of these learnings) without knowing what a derivative is. It's knowledge that comes context free. And even when it can be understood, I'm not sure I believe it gets integrated especially well when you did none of the work to understand it. If you are extremely diligent and self-aware about what your limitations are, and careful to be sure you have an understanding of this knowledge, sure I guess you can learn a lot.
And ultimately what do you think is more likely? People using the experience of using these tools to progress their knowledge or for them to rely on the answers uncritically? I think people with a rosy view about this are severely undercounting the problems associated with the trust relationship between a person and an LLM and what that means.
simonw
8 minutes ago
> I think people with a rosy view about this are severely undercounting the problems associated with the trust relationship between a person and an LLM and what that means.
Personally I think the impact of LLMs on children's education is a crisis right now.
Kids are not going to learn to write if an LLM writes their essays for them. And writing is how you learn to think.
almostdeadguy
7 minutes ago
I don't think it's just a problem for kids! I think this is problem for many software engineers as well!
saberience
an hour ago
And Fable is still worse than Codex.
I use both and the only thing (as always) that I will use Claude for is UI design.
Opus 4.8 and now Fable are still both worse at actually getting the job done than the Codex model. Claude models write FAR too much code when it's not needed, they burn far too many tokens, when they are not needed, write un-necessary tests, write plans which are 5 pages longer than are needed, etc. etc.
Have you actually compared code quality and plan quality versus Codex? It's demonstrably worse.
elbear
38 minutes ago
Curious, which model do you use for Codex? I'm very happy with the solutions '5.5 high' finds. It's like it understands exactly what I mean and it also anticipates all sorts of situations. Before I used '5.5 medium' for some time and it was a bit underwhelming. It may sound funny but it's like it didn't care that much to do a good job.
felixgallo
44 minutes ago
In my experience writing about 50 programs with fable, opus, and GPT, fable is a significant step change better than opus which is significantly better than GPT. We must be doing different things.
jmmcd
2 hours ago
People are missing that Willison is among the very best people we have in the role of (for lack of a good name): early access to frontier models, evaluate them in real scenarios, no wishful thinking, hype, or doom, communicate the possibilities. Yes he could have fixed this himself but then he would have learned nothing about the AI, and we wouldn't have read a fascinating and important article.
risyachka
2 hours ago
>> he would have learned nothing about the AI
there is absolutely zero value in spending time to learn about new models as in few months new model will be out and whatever you learned about the current one will be useless.
Also with models getting better and better you have to know less and less to achieve same results.
simonw
2 hours ago
My experience has been the exact opposite.
As the models get better you need to know more about their capabilities, because otherwise you risk prompting Claude Fable 5 like it's GPT-4o and complaining loudly about how it's all hype and nothing about these models is improving at all (yes, I do see people say that.)
Getting the best results out of these models requires skill, experience, intuition, and domain expertise. There's always room for improving every one of those.
philipwhiuk
an hour ago
Isn't the whole point of a better model that it should be better at understanding you than the previous one? So the same prompt should return a better answer.
Prompting differently to the new model seems entirely backwards when trying to determine if the model has improved.
simonw
29 minutes ago
It doesn't matter how good the models get, they still won't be able to act on unclear directions.
Learning to provide unambiguous, clear directions is a skill. A lot of people who report bad experiences with models aren't yet good at that skill.
More importantly though, the key to successful communication is having a good understanding of what the other side of the conversation already knows and understands.
Saying "use uv and inline script dependencies" won't mean anything to a model with a knowledge cutoff date prior to the launch of uv!
dasil003
12 minutes ago
I think this is true when models were going from bad to pretty good like happened last year. But when they start to get good, and can work deeper and with more nuance, how you prompt also can change the results quite a bit. Note this is also true of asking smart humans to do things; personality and approaches vary, they don’t exist on a single axis continuum of quality
ViscountPenguin
an hour ago
Eh, I've have the exact opposite experience.
Way back before instruct models it was pretty difficult, but for the last couple of years I haven't needed anything more complex than the type of text that I might send in a detailed email to a colleague.
Dumblydorr
an hour ago
There’s zero value? Surely you don’t believe zero, it’s potentially the most powerful predictive AI in the world ever made? Maybe only incremental steps sure. But also their IPO is coming, you don’t want people evaluating them beforehand?
fragmede
an hour ago
you know, women make a big deal about you meeting their father/parents, and honestly, I'm too autistic to really fucking have put any importance until now as to why that was remotely important, but if N+1 is coming for your job, it seems it might be worth your while to know the capabilities of N, no?
discordance
3 hours ago
I see it as a prioritization exercise. I know the above is a trivial example, but more generally, does the guy who wrote Datasette and Django want to wrangle front end and css, or do they want to work on something else?
oulipo2
3 hours ago
And ruin the planet with more heating, CO2, and wasted water
simonw
an hour ago
Here's a handy calculator you can use to estimate how much CO2 and water I wasted with my coding agent session: https://www.andymasley.com/visuals/ai-prompt-footprint/
PinkaDunka
19 minutes ago
Not sure what point you wanted to make, but this calculator is quite shocking. GPT 5.5 pro, with "a long document" and 10 requests a day gives 25% of daily CO2 emissions!
Ten coding sessions a day with Opus is still 4.7%!
This feels enormous. I will definitely stop rolling my eyes when people complain about AI CO/water usage...
simonw
10 minutes ago
GPT-5.5 Pro is a notoriously expensive model, it's 6x the price of GPT-5.5. Not something to use as a daily driver!
That ten coding sessions a day with Opus number feels more credible to me.
beernet
2 hours ago
Only on a US platform would this comment get downvoted. This is an absolutely legit thought. While I know the administration that you elected does not care about scientific evidence, I want to point you to the current El Nino conditions [1].
[1] https://www.msn.com/en-us/weather/topstories/el-nino-conditi...
_heimdall
an hour ago
That's an interesting choice as a source. It doesn't mention climate change or human impacts at all and describes El Niño as a naturally occurring event.
> The El Nino is a phenomenon that occurs naturally
user43928
2 hours ago
While one can raise environmental concerns about the AI datacenter buildout, I don't think it is fair to say that it "ruins the planet".
I don't think it is a good contribution to the discussion around Simon's LLM use to fix a CSS bug.
vitalyan1234
37 minutes ago
wow, you've managed to include "orange man bad" into your "current thing bad" post! I wish I could upvote your post twice, fellow redditor!
harperlee
2 hours ago
It was posted at 5am in New York... not sure that that was a US view, so the fact that the platform is US-owned doesn't seem so relevant, if there's a global audience.
That being said, I do agree it is a legit thought (and moreso, completely on point in the subthread discussing downsides), and that it shouldn't be downvoted.
simonw
2 hours ago
You missed what I think is the most interesting question: why does the bug appear in Safari macOS but not in Firefox, Chrome, or WebKit running inside of Playwright?
(Dozens of people in this thread implying that any web dev should have known to solve it with overflow-x: hidden and not one of them have addressed that browser difference yet.)
gib444
4 hours ago
The 'better' fixes are often for our (human) benefit. These messy fixes serve the AI companies' interests of creating messes that need even more tokens (money) later. Bad and self-serving developers also act the same, creating tech debt