fny
21 hours ago
I fear the conceptual churn we're going to endure in the coming years will rival frontend dev.
Across ChatGPT and Claude we now have tools, functions, skills, agents, subagents, commands, and apps, and there's a metastasizing complex of vibe frameworks feeding on this mess.
libraryofbabel
20 hours ago
You forgot mcp-everything!
Yes, it's a mess, and there will be a lot of churn, you're not wrong, but there are foundational concepts underneath it all that you can learn and then it's easy to fit insert-new-feature into your mental model. (Or you can just ignore the new features, and roll your own tools. Some people here do that with a lot of success.)
The foundational mental model to get the hang of is really just:
* An LLM
* ...called in a loop
* ...maintaining a history of stuff it's done in the session (the "context")
* ...with access to tool calls to do things. Like, read files, write files, call bash, etc.
Some people call this "the agentic loop." Call it what you want, you can write it in 100 lines of Python. I encourage every programmer I talk to who is remotely curious about LLMs to try that. It is a lightbulb moment.
Once you've written your own basic agent, if a new tool comes along, you can easily demystify it by thinking about how you'd implement it yourself. For example, Claude Skills are really just:
1) Skills are just a bunch of files with instructions for the LLM in them.
2) Search for the available "skills" on startup and put all the short descriptions into the context so the LLM knows about them.
3) Also tell the LLM how to "use" a skill. Claude just uses the `bash` tool for that.
4) When Claude wants to use a skill, it uses the "call bash" tool to read in the skill files, then does the thing described in them.
and that's more or less it, glossing over a lot of things that are important but not foundational like ensuring granular tool permissions, etc.
skissane
8 hours ago
> You forgot mcp-everything!
One great thing about the MCP craze, is it has given vendors a motivation to expose APIs which they didn’t offer before - real example, Notion’s public REST API lacks support for duplicating pages.. yes their web UI can do it, calling their private REST API, but their private APIs are complex, undocumented, and could stop working at any time with no notice. Then they added it to their MCP server - and MCP is just a JSON-RPC API, you aren’t limited to only invoking it from an LLM agent, you can also invoke it from your favourite scripting language with no LLM involved at all
libraryofbabel
5 hours ago
I remember reading in one of Simon Willison's recent blog posts his half-joking point that MCP got so much traction so fast because adding a remote MCP server allowed tech management at big companies whose C-suite is asking them for an "AI Strategy" to show that they were doing something. I'm sure that is a little bit true - a project framed as "make our API better and more open and well-documented" would likely never have got off the ground at many such places. But that is exactly what this is, really.
At least it's something we all reap the benefits of, even if MCP is really mostly just an api wrapper dressed up as "Advanced AI Technology."
alvis
6 hours ago
Well. I bet Notion simply forget some of APIs are private before. I started developing using Notion APIs on the first day it got released. They have constant updates and I have seen lots of improvement. There is just no reason why they intentionally want to make the duplicate page API on MCP but not api.
PS. Just want to say, Notion MCP is still very buggy. It can't handle code block, nor large page very well
aabhay
7 hours ago
Amazing example. AI turns the bedgrudging third rate API UX into a must-win agent UX
mooreds
6 hours ago
and we all win!
delgaudm
an hour ago
> Some people call this "the agentic loop." Call it what you want, you can write it in 100 lines of Python
That description sounds a lot like PocketFlow, an AI/LLM development framework based on a loop that's about 100 lines of python:
https://github.com/The-Pocket/PocketFlow
(I'm not at all affiliated with Pocket Flow, I just recall watching a demo of it)
ibejoeb
19 hours ago
Pretty true, and definitely a good exercise. But if we're going to actual use these things in practice, you need more. Things like prompt caching, capabilities/constraints, etc. It's pretty dangerous to let an agent go hog wild in an unprotected environment.
libraryofbabel
19 hours ago
Oh sure! And if I was talking someone through building a barebones agent, I'd definitely tag on a warning along the lines of "but don't actually use this without XYZ!" That said, you can add prompt caching by just setting a couple of parameters in the api calls to the LLM. I agree constraints is a much more complex topic, although even in my 100-line example I am able to fit in a user approval step before file write or bash actions.
apsurd
18 hours ago
when you say prompt caching, does it mean cache the thing you send to the llm or the thing you get back?
sounds like prompt is what you send, and caching is important here because what you send is derived from previous responses from llm calls earlier?
sorry to sound dense, I struggle to understand where and how in the mental model the non-determinism of a response is dealt with. is it just that it's all cached?
libraryofbabel
18 hours ago
Not dense to ask questions! There are two separate concepts in play:
1) Maintaining the state of the "conversation" history with the LLM. LLMs are stateless, so you have to store the entire series of interactions on the client side in your agent (every user prompt, every LLM response, every tool call, every tool call result). You then send the entire previous conversation history to the LLM every time you call it, so it can "see" what has already happened. In a basic agent, it's essentially just a big list of strings, and you pass it into the LLM api on every LLM call.
2) "Prompt caching", which is a clever optimization in the LLM infrastructure to take advantage of the fact that most LLM interactions involve processing a lot of unchanging past conversation history, plus a little bit of new text at the end. Understanding it requires understanding the internals of LLM transformer architecture, but the essence of it is that you can save a lot of GPU compute time by caching previous result states that then become intermediate states for the next LLM call. You cache on the entire history: the base prompt, the user's messages, the LLM's responses, the LLM's tool calls, everything. As a user of an LLM api, you don't have to worry about how any of it works under the hood, you just have to enable it. The reason to turn it on is it dramatically increases response time and reduces cost.
Hope that clarifies!
apsurd
17 hours ago
Very helpful. It helps me better understand the specifics behind each call and response, the internal units and whether those units are sent and received "live" from the LLM or come from a traditional db or cache store.
I'm personally just curious how far, clever, insightful, any given product is "on top of" the foundation models. I'm not in it deep enough to make claims one way or the other.
So this shines a little more light, thanks!
ayewo
an hour ago
This recent comment https://news.ycombinator.com/item?id=45598670 by @simonw really helped drive home the point that LLMs are really being fed an array of strings.
colordrops
16 hours ago
Why wouldn't you turn on prompt caching? There must be a reason why it's a toggle rather than just being on for everything.
TimMoore
15 hours ago
Writing to the cache is more expensive than a request with caching disabled. So it only makes economic sense to do it when you know you're going to use the cached results. See https://docs.claude.com/en/docs/build-with-claude/prompt-cac...
adastra22
11 hours ago
When you know the context is a one-and-done. Caching costs more than just running the prompt, but less than running the prompt twice.
data-ottawa
2 hours ago
It’s also a very fun project, you can set up a small LLM with ollama or lm studio and get working quickly. Using MCP it’s very fast to getting that actually useful.
I’ve done this a few times (pre and post MCP) and learned a lot each time.
KingOfMyRoom
9 hours ago
You have a great way of demystifying things. Thanks for the insights here!
Do you think a non-programmer could realistically build a full app using vibe coding?
What fundamentals would you say are essential to understand first?
For context, I’m in finance, but about 8 years ago I built a full app with Angular/Ionic (live on Play Store, under review on Apple Store at that time) after doing a Coursera specialization. That was my first startup attempt, I haven’t coded since.
My current idea is to combine ChatGPT prompts with Lovable to get something built, then fine-tune and iterate using Roo Code (VS plugin).
I’d love to try again with vibe coding. Any resources or directions you’d recommend?
Arkhaine_kupo
2 hours ago
> Do you think a non-programmer could realistically build a full app using vibe coding?
For personal or professional use?
If you want to make it public I would say 0% realistic. The bugs, security concerns, performance problems etc you would be unable to fix are impossible to enumerate.
But even if you had a simple loging and kept people's email and password, you can very easily have insecure dbs, insecure protections against simple things like mysqliinjections etc.
You would not want to be the face of "vibe coder gives away data of 10k users"
felixhammerl
8 hours ago
If your app has to display stuff, you have no code kits available that can help you out. No vibe coding needed.
If your app has to do something useful, your app just exploded in complexity and corner cases that you will have to account for and debug. Also, if it does anything interesting that the LLM has not yet seen a hundred thousand times, you will hit the manual button quite quickly.
Claude especially (with all its deserved praise) fantasizes so much crap together while claiming absolute authority in corner cases, it can become annoying.
KingOfMyRoom
8 hours ago
That makes sense, I can see how once things get complex or novel, the LLMs start to struggle. I don't think my app is doing anything complex.
For now, my MVP is pretty simple: a small app for people to listen to soundscapes for focus and relaxation. Even if no one uses, at least it's going to be useful to me and it will be a fun experiment!
I’m thinking of starting with React + Supabase (through Lovable), that should cover most of what I need early on. Once it’s out of the survival stage, I’ll look into adding more complex functionality.
Curious, in your experience, what’s the best way to keep things reliable when starting simple like this? And are there any good resources you can point to?
ashtonshears
an hour ago
You can make that. The only ai coding tools i have liked is openai codex and claude code. I would start with working with it to create a design document in markdown to plan the project. Then i would close the app to reset context, and tell it to read that file, and create an implementation plan for the project in various phases. Then i would close context, and have it start implementing. I dont always like that many steps, but for a new user it can help see ways to use the tools
ashtonshears
42 minutes ago
Learning how to get it to run build steps was a big boost in my initial productivity when learning the cli tools
ashtonshears
an hour ago
Maybe react native if you like react
wouldbecouldbe
7 hours ago
Really depends on the app you want to build.
If I'd use Vibe coding I wouldn't use Lovable but Claude code. You can run it in your terminal.
And I would ask it to use NextAuth, NextJS and Prisma (or another ORM), and connect it with SQLite or an external MariaDB managed server (for easy development you can start with SQLLite, for deployment to vercel you need an external database).
People here shit on nextjs, but due to its extensive documentation & usage the LLM's are very good at building with it, and since it forces a certain structure it produces generally decently structured code that is workable for a developer.
Also vercel is very easy to deploy, just connect Github and you are done.
Make sure to properly use GIT and commit per feature, even better branch per feature. So you can easily revert back to old versions if Claude messed up.
Before starting, spend some time sparring with GPT5 thinking model to create a database scheme thats future proof before starting out. It might be a challenge here to find the right balance between over-engineering and simplicity.
One caveat: be careful to run migration on your production database with Claude. It can accidentally destroy it. So only run your claude code on test databases.
KingOfMyRoom
2 hours ago
Thanks a lot for all the pointers.
I’m not 100% set on Lovable yet. Right now I’m using Stitch AI to build out the wireframes. The main reason I was leaning toward Lovable is that it seems pretty good at UI design and layout.
How does Claude do on that front? Can it handle good UI structure or does it usually need some help from a design tool?
Also, is it possible to get mobile apps out of a Next.js setup?
My thought was to start with the web version, and later maybe wrap it using Cordova (or Capacitor) like I did years ago with Ionic to get Android/iOS versions. Just wondering if that’s still a sensible path today.
callamdelaney
6 hours ago
It's all just prompt stuffing in the end.
dlivingston
20 hours ago
> Call it what you want, you can write it in 100 lines of Python. I encourage every programmer I talk to who is remotely curious about LLMs to try that. It is a lightbulb moment.
Definitely want to try this out. Any resources / etc. on getting started?
libraryofbabel
19 hours ago
This is the classic blog post, by Thorsten Ball, from way back in the AI Stone Age (April this year): https://ampcode.com/how-to-build-an-agent
It uses Go, which is more verbose than Python would be, so he takes 300 lines to do it. Also, his edit_file tool could be a lot simpler (I just make my minimal agent "edit" files by overwriting the entire existing file).
I keep meaning to write a similar blog post with Python, as I think it makes it even clearer how simple the stripped-down essence of a coding agent can be. There is magic, but it all lives in the LLM, not the agent software.
judahmeek
18 hours ago
> I keep meaning to write a similar blog post with Python...
Just have your agent do it.
libraryofbabel
18 hours ago
I could, but I'm actually rather snobbish about my writing and don't believe in having LLMs write first drafts (for proofreading and editing, they're great).
(I am not snobbish about my code. If it works and is solid and maintainable I don't care if I wrote it or not. Some people seem to feel a sense of loss when an LLM writes code for them, because of The Craft or whatever. That's not me; I don't have my identity wrapped up in my code. Maybe I did when I was more junior, but I've been in this game long enough to just let it go.)
jona777than
4 hours ago
I highly relate to this. Code works or it doesn’t. My writing feels a lot more like self expression. I agree that’s harder to “let go” to an agent.
canyon289
16 hours ago
I wrote a post here with zero abstractions. Its all self contained and runs locally.
https://ravinkumar.com/GenAiGuidebook/language_models/Agents... https://github.com/canyon289/ai_agent_basics/blob/main/noteb...
kvirani
14 hours ago
How does it call upon the correct skill from a vast library of skills at the right time? Is this where RAG via embeddings / vector search come in? My mental model is still weak in this area, I admit.
visarga
13 hours ago
I think it has a compact table of contents of all the skills it can call preloaded. It's not RAG, it navigates based on references between files, like a coding agent.
libraryofbabel
6 hours ago
This is correct. It just puts a list of skills into context as part of the base prompt. The list must be compact because the whole point of skills is to reduce context bloat by keeping all the details out of context until they are needed. So the list will just be something like: 1) skill name, 2) short (like one sentence) description of what the skill is for, 3) where to find the skill (file path, basically) when it wants to read it in.
xnx
17 hours ago
Might as well include agent2agent in there: https://developers.googleblog.com/en/a2a-a-new-era-of-agent-...
Der_Einzige
20 hours ago
Tool use is only good with structured/constrained generation
libraryofbabel
20 hours ago
You'll need to expand on what you mean, I'm afraid.
AStrangeMorrow
19 hours ago
I think, from my experience, what they mean is tool use is as good as your model capability to stick to a given answer template/grammar. For example if it does tool calling using a JSON format it needs to stick to that format, not hallucinate extra fields and use the existing fields properly. This has worked for a few years and LLMs are getting better and better but the more tools you have, the more parameters your functions to call can have etc the higher the risk of errors. You also have systems that constrain the whole inference itself, for example with the outlines package, by changing the way tokens are sampled (this way you can force a model to stick to a template/grammar, but that can also degrade results in some other ways)
libraryofbabel
19 hours ago
I see, thanks for channeling the GP! Yeah, like you say, I just don't think getting the tool call template right is really a problem anymore, at least with the big-labs SotA models that most of us use for coding agents. Claude Sonnet, Gemini, GPT-5 and friends have been heavily heavily RL-ed into being really good at tool calls, and it's all built into the providers' apis now so you never even see the magic where the tool call is parsed out of the raw response. To be honest, when I first read about tools calls with LLMs I thought, "that'll never work reliably, it'll mess up the syntax sometimes." But in practice, it does work. (Or, to be more precise, if the LLM ever does mess up the grammar, you never know because it's able to seamlessly retry and correct without it ever being visible at the user-facing api layer.) Claude Code plugged into Sonnet (or even Haiku) might do hundreds of tool calls in an hour of work without missing a beat. One of the many surprises of the last few years.
zmmmmm
17 hours ago
Yep, the ecosystem is well on its way to collapsing under its own weight.
You have to remember, every system or platform has a total complexity budget that effectively sits at the limit of what a broad spectrum of people can effectively incorporate into their day to day working memory. How it gets spent is absolutely crucial. When a platform vendor adds a new piece of complexity, it comes from the same budget that could have been devoted to things built on the platform. But unlike things built on the platform, it's there whether developers like it and use it or not. It's common these days that providers binge on ecosystem complexity because they think it's building differentiation, when in fact it's building huge barriers to the exact audience they need to attract to scale up their customer base, and subtracting from the value of what can actually be built on their platform.
Here you have a highly overlapping duplicative concept that's taking a solid chunk of new complexity budget but not really adding a lot of new capability in return. I am sure the people who designed it think they are reducing complexity by adding a "simple" new feature that does what people would otherwise have to learn themselves. It's far more likely they are at break even for how many people they deter vs attract from using their platform by doing this.
flutetornado
8 hours ago
There are several useful ways of engineering the context used by LLMs for different use cases.
MCP allows anybody to extend their own LLM application's context and capabilities using pre-built *third party* tools.
Agent Skills allows you to let the LLM enrich and narrow down it's own context based on the nature of the task it's doing.
I have been using a home grown version of Agent Skills for months now with Claude in VSCode, using skill files and extra tools in folders for the LLM to use. Once you have enough experience writing code with LLMs, you will realize this is a natural direction to take for engineering the context of LLMs. Very helpful in pruning unnecessary parts from "general instruction files" when working on specific tasks - all orchestrated by the LLM itself. And external tools for specific tasks (such as finding out which cell in a jupyter notebook contains the code that the LLM is trying to edit, for example) make LLMs a lot more accurate and efficient, efficient because they are not burning through precious tokens to do the same and accurate because the tools are not stochastic.
With Claude Skills now I don't need to maintain my home grown contraption. This is a welcome addition!
mathattack
21 hours ago
There's so much white space - this is the cost of a brand new technology. Similar issues with figuring out what cloud tools to use, or what python libraries are most relevant.
This is also why not everyone is an early adopter. There are mental costs involved in staying on top of everything.
benterix
20 hours ago
> This is also why not everyone is an early adopter.
Usually, there are relatively few adopters of a new technology.
But with LLMs, it's quite the opposite: there was a huge number of early adopters. Some got extremely excited and run hundreds of agents all the time, some got burned and went back to the good old ways of doing things, whereas the majority is just using LLMs from time to time for various tasks, bigger of smaller.
a4isms
20 hours ago
I follow your reasoning. If we just look at businesses, and we include every business that pays money for AI and one or more employees use AI to do their their jobs, then we're in the Early Majority phase, not the Innovator or Early Adopter phases.
https://en.wikipedia.org/wiki/Technology_adoption_life_cycle
mathattack
19 hours ago
There's early adoption from individuals. Much less from enterprises. (They're buying site licenses, but not re-engineering their company processes)
LPisGood
21 hours ago
Metastasizing is such an excellent way to describe this phenomenon. They grow on top of each other.
kbar13
21 hours ago
i’m letting the smarter folks figure all this out and just picking the tools i like every now and then. i like just using claude code with vscode and still doing some things manually
articsputnik
2 hours ago
yeah, avoiding all the serialization and deserialization, as I'm already working in Markdown and open text for almost all my stuff. The Claude Skill only seems to make sense for people who don't have their data in multiple different proprietary formats, then it might sense to packaging them into another one. But this can get messy pretty quick!
efields
19 hours ago
same same
MomsAVoxell
7 hours ago
Another filthy casual checking in. Let the kids churn, the froth rises to the top and anyway .. I've got a straw.
SafeDusk
16 hours ago
That is why a minimal framework[1] that allows me to understand the core immutable loop, but to quickly experiment with all these imperative concepts is invaluable.
I was able to try Beads[1] quickly with my framework and decided I like it enough to keep it. If I don't like it, just drop it, they're composable.
[0]: https://github.com/aperoc/toolkami.git [1]: https://github.com/steveyegge/beads
scrollaway
8 hours ago
Yeah Beads is a very nice experience. Useful, easy to set up, easy to drop.
awb
20 hours ago
Hopefully there’s a similar “don’t make me think” mantra that comes to AI product design.
I like the trend where the agent decides what models, tooling and thought process to use. That seems to me far more powerful than asking users to create solutions for each discreet problem space.
kingkongjaffa
18 hours ago
Where I've seen it be really transformative is giving it additive tools that are multiplicative in utility. So like giving an LLM 5 primitive tools for a specific domain and the agent figuring out how to use them together and chain them and run some tools multiple times etc.
esafak
20 hours ago
On the other hand, this complexity represents a new niche that, for a while at least, will present job and business opportunities.
lukev
17 hours ago
The cool part is that none of any of this is actually that big or difficult. You can master it on-demand, or build your own substitutes if necessary.
Yeah, if you chase buzzword compliance and try to learn all these things outside of a particular use case you're going to burn out and have a bad time. So... don't?
kelvinjps10
19 hours ago
I found that the way that Claude now handle tools on my sistema simplifies stuff, with its cli usage, I find the Claude skills model better than mcp
jessmartin
17 hours ago
Same. Was very excited about MCP but Claude code + CLI tools is so much nicer.
MomsAVoxell
7 hours ago
It's fine, just use an AI to organise it all. Soon enough, nobody will need to know anything.
eru
13 hours ago
AI tools can help you with the churn.
AI will help you solve problems you wouldn't have without AI.
DrewADesign
16 hours ago
Not to mention GANs, RAGs, context decoupling, prompt matrices, NAGGLs, first-class keywords, reverse token interrupts, agentic singletons, parallel context bridges…
… jk… I’ll bet at least one person was like “ah, damnit, what did I miss…” for a second.
catgary
20 hours ago
These companies are also biased towards solutions that will more-or-less trap you in a heavily agent-based workflow.
I’m surprised/disappointed that I haven’t seen any papers out of the programming languages community about how to integrate agentic coding with compilers/type system features/etc. They really need to step up, otherwise there’s going to be a lot of unnecessary CO2 produced by tools like this.
typpilol
12 hours ago
I kind of do this by making LLM run my linter which has typed lint rules.
The way I can get any decent code out of them for typescript is by having no joke, 60 eslint plugins. It forces them to write actual decent code, although it takes them forever
siva7
17 hours ago
It feels like every week these companies release some new product that feels very similar to what they released a week before. Can the employees at Anthropic even tell themselves what the difference is?
zqna
7 hours ago
I bet that most of those products are created by their own "AI". They must already be using AI product owners, developers, testers, as their human counterparts are only sitting their in their chairs and only busy training their AI simulation and moderating their output. Next logical step will be AI doing that with the human folks hitting the street, then recursively ad infinitum. They will reach the glorified singularity there really soon!
amelius
17 hours ago
These products are all cannibalizing eachother, so a bad strategy.
Trias11
20 hours ago
Right.
I focus on building projects delivering some specific business value and pick the tools that gets me there.
There is zero value in spending cycles by engaging in new tools hype.
dalmo3
20 hours ago
For Cursor: cursorrules, mdc rules, user rules, team rules.
blitzar
8 hours ago
Just need to add some use cases
butlike
20 hours ago
Just wait until I can pull in just the concepts I want with "GPT Package Manager." I can simply call `gptpm add skills` and the LLM package manager will add the Skills package to my GPT. What could go wrong?
dhamidi
10 hours ago
That's already the case with https://docs.claude.com/en/docs/claude-code/plugins
solumunus
10 hours ago
As usual, stick with the basic 20% which give 80% of the value.
dyauspitr
11 hours ago
All of these things seem unnecessary. You can just ask the general prompt any of these things. I don’t really understand what exactly an agent adds on since it feel like the only thing about an agent is a restricted output.
hkt
21 hours ago
The same thing will happen: skilled people will do one thing well. I've zero interest in anything but Claude code in a dev container and, while mindful of the lethal trifecta, will give Claude as much access to a local dev environment and it's associated tooling as I would give to a junior developer.
iLoveOncall
20 hours ago
Except in reality it's ALL marketing terms for 2 things: additional prompt sections, and APIs.
james_marks
19 hours ago
I more or less agree, but it’s surprising what naming a concept does for the average user.
You see a text file and understand that it can be anything, but end users can’t/won’t make the jump. They need to see the words Note, Reminder, Email, etc.
__loam
20 hours ago
Langchain was the original sin of thin framework bullshit
nurettin
11 hours ago
You can just ask an LLM to set it up for you. Slop in, slop out.