JohnMakin
15 hours ago
> Building powerful and reliable AI Agents is becoming less about finding a magic prompt or model updates.
Ok, I can buy this
> It is about the engineering of context and providing the right information and tools, in the right format, at the right time.
when the "right" format and "right" time are essentially, and maybe even necessarily, undefined, then aren't you still reaching for a "magic" solution?
If the definition of "right" information is "information which results in a sufficiently accurate answer from a language model" then I fail to see how you are doing anything fundamentally differently than prompt engineering. Since these are non-deterministic machines, I fail to see any reliable heuristic that is fundamentally indistinguishable than "trying and seeing" with prompts.
mentalgear
14 hours ago
It's magical thinking all the way down. Whether they call it now "prompt" or "context" engineering because it's the same tinkering to find something that "sticks" in non-deterministic space.
nonethewiser
9 hours ago
>Whether they call it now "prompt" or "context" engineering because it's the same tinkering to find something that "sticks" in non-deterministic space.
I dont quite follow. Prompts and contexts are different things. Sure, you can get thing into contexts with prompts but that doesn't mean they are entirely the same.
You could have a long running conversation with a lot in the context. A given prompt may work poorly, whereas it would have worked quite well earlier. I don't think this difference is purely semantic.
For whatever it's worth I've never liked the term "prompt engineering." It is perhaps the quintessential example of overusing the word engineering.
Turskarama
5 hours ago
Both the context and the prompt are just part of the same input. To the model there is no difference, the only difference is the way the user feeds that input to the model. You could in theory feed the context into the model as one huge prompt.
__loam
an hour ago
Sometimes I wonder if LLM proponents even understand their own bullshit.
It's all just tokens in the context window right? Aren't system prompts just tokens that stay appended to the front of a conversation?
They're going to keep dressing this up six different ways to Sunday but it's always just going to be stochastic token prediction.
simonw
22 minutes ago
System prompts don't even have to be appended to the front of the conversation. For many models they are actually modeled using special custom tokens - so the token stream looks a bit like:
<system-prompt-starts>
translate to English
<system-prompt-ends>
An explanation of dogs: ...
The models are then trained to (hopefully) treat the system prompt delimited tokens as more influential on how the rest of the input is treated.StevenWaterman
27 minutes ago
Yep, every AI call is essentially just asking it to predict what the next word is after:
<system>
You are a helpful assistant.
</system>
<user>
Why is the sky blue?
</user>
<assistant>
Because of Rayleigh scattering. The blue light refracts more.
</assistant>
<user>
Why is it red at sunset then?
</user>
<assistant>
And we keep repeating that until the next word is `</assistant>`, then extract the bit in between the last assistant tags, and return it. The AI has been trained to look at `<user>` differently to `<system>`, but they're not physically different.It's all prompt, it can all be engineered. Hell, you can even get a long way by pre-filling the start of the Assistant response. Usually works better than a system message. That's prompt engineering too.
pennaMan
5 hours ago
I always used "prompting" to mean "providing context" in genral not necesarlly just clever instructions like people seem to be using the term.
And yes, I view clever instructions like "great grandma's last wish" still as just providing context.
>A given prompt may work poorly, whereas it would have worked quite well earlier.
The context is not the same! Of course the "prompt" (clever last sentence you just added to the context) is not going to work "the same". The model has a different context now.
ffsm8
8 hours ago
Yeah, if anything it should be called an art.
The term engineering makes little sense in this context, but really... Did it make sense for eg "QA Engineer" and all the other jobs we tacked it on, too? I don't think so, so it's kinda arguing after we've been misusing the term for well over 10 yrs
groestl
7 hours ago
Well, to get the right thing into the context in a performant way when you dealing with a huge dataset is definitely engineering.
shakna
4 hours ago
Engineering tends to mean "the application of scientific and mathematical principles to practical ends".
I'm not sure there's much scientific or mathematical about guessing how a non-deterministic system will behave.
SonOfLilit
an hour ago
The moment you start building evaluation pipelines and running experiments to validate your ideas it stops being guessing
simonw
21 minutes ago
Right: for me that's when "prompt engineering"/"context engineering" start to earn the "engineering" suffix: when people start being methodical and applying techniques like evals.
belter
an hour ago
Got it...updating CV to call myself a VibeOps Engineer in a team of Context Engineers...A few of us were let go last quarter, as they could only do Prompt Engineering.
surecoocoocoo
6 hours ago
We used to define a specification.
In other words; context.
But that was like old man programming.
As the laws of physics changed between 1970 and 2009.
ironmagma
5 hours ago
What is all software but tinkering?
I mean this not as an insult to software dev but to work generally. It’s all play in the end.
v3ss0n
3 hours ago
At this point , due to non-deterministic nature and hallucination context engineering is pretty much magic. But here are our findings.
1 - LLM Tends to pick up and understand contexts that comes at top 7-12 lines.Mostly first 1k token is best understood by llms ( tested on Claude and several opensource models ) so - most important contexts like parsing rules need to be placed there.
2 - Need to keep context short . Whatever context limit they claim is not true . They may have long context window of 1 mil tokens but only up to avg 10k token have good accuracy and recall capabilities , the rest is just bunk , just ignore them. Write the prompt and try compressing/summerizing it without losing key information manually or use of LLM.
3 - If you build agent-to-agent orchestration , don't build agents with long context and multiple tools, break them down to several agents with different set of tools and then put a planning agent which solely does handover.
4 - If all else fails , write agent handover logic in code - as it always should.
From building 5+ agent to agent orchestration project on different industries using autogen + Claude - that is the result.
lblume
an hour ago
I have uploaded entire books to the latest Gemini and had the model reliably accurately answer specific questions requiring knowledge of multiple chapters.
FeepingCreature
11 minutes ago
I think it works for info but not so well for instructions/guidance. That's why the standard advice is instructions at the start and repeated at the end.
fwn
9 minutes ago
That’s pretty typical, though not especially reliable. (Allthough in my experience, Gemini currently performs slightly better than ChatGPT for my case.)
In one repetitive workflow, for example, I process long email threads, large Markdown tables (which is a format from hell), stakeholder maps, and broader project context, such as roles, mailing lists, and related metadata. I feed all of that into the LLM, which determines the necessary response type (out of a given set), selects appropriate email templates, drafts replies, generates documentation, and outputs a JSON table.
It gets it right on the first try about 75% of the time, easily saving me an hour a day - often more.
Unfortunately, 10% of the time, the responses appear excellent but are fundamentally flawed in some way. Just so it doesn't get boring.
Aeolun
9 hours ago
There is only so much you can do with prompts. To go from the 70% accuracy you can achieve with that to the 95% accuracy I see in Claude Code, the context is absolutely the most important, and it’s visible how much effort goes into making sure Claude retrieves exactly the right context, often at the expense of speed.
majormajor
8 hours ago
Why are we drawing a difference between "prompt" and "context" exactly? The linked article is a bit of puffery that redefines a commonly-used term - "context" - to mean something different than what it's meant so far when we discuss "context windows." It seems to just be some puffery to generate new hype.
When you play with the APIs the prompt/context all blurs together into just stuff that goes into the text fed to the model to produce text. Like when you build your own basic chatbot UI and realize you're sending the whole transcript along with every step. Using the terms from the article, that's "State/History." Then "RAG" and "Long term memory" are ways of working around the limits of context window size and the tendency of models to lose the plot after a huge number of tokens, to help make more effective prompts. "Available tools" info also falls squarely in the "prompt engineering" category.
The reason prompt engineering is going the way of the dodo is because tools are doing more of the drudgery to make a good prompt themselves. E.g., finding relevant parts of a codebase. They do this with a combination of chaining multiple calls to a model together to progressively build up a "final" prompt plus various other less-LLM-native approaches (like plain old "find").
So yeah, if you want to build a useful LLM-based tool for users you have to write software to generate good prompts. But... it ain't really different than prompt engineering other than reducing the end user's need to do it manually.
It's less that we've made the AI better and more that we've made better user interfaces than just-plain-chat. A chat interface on a tool that can read your code can do more, more quickly, than one that relies on you selecting all the relevant snippets. A visual diff inside of a code editor is easier to read than a markdown-based rendering of the same in a chat transcript. Etc.
arugulum
8 hours ago
Because the author is artifically shrinking the scope of one thing (prompt engineering) to make its replacement look better (context engineering).
Never mind that prompt engineering goes back to pure LLMs before ChatGPT was released (i.e. before the conversation paradigm was even the dominant one for LLMs), and includes anything from few-shot prompting (including question-answer pairs), providing tool definitions and examples, retrieval augmented generation, and conversation history manipulation. In academic writing, LLMs are often defined as a distribution P(y|x) where X is not infrequently referred to as the prompt. In other words, anything that comes before the output is considered the prompt.
But if you narrow the definition of "prompt" down to "user instruction", then you get to ignore all the work that's come before and talk up the new thing.
Aeolun
2 hours ago
> Why are we drawing a difference between "prompt" and "context" exactly?
Because they’re different things? The prompt doesn’t dynamically change. The context changes all the time.
I’ll admit that you can just call it all ‘context’ or ‘prompt’ if you want, because it’s essentially a large chunk of text. But it’s convenient to be able to distinguish between the two so you can talk about the same thing.
__loam
an hour ago
It's all the same blob of text in the api call
FeepingCreature
10 minutes ago
There's always been a distinction between prompt and data.
simonw
8 hours ago
One crucial difference between prompt and the context: the prompt is just content that is provided by a user. The context also includes text that was output by the bot - in conversational interfaces the context incorporates the system prompt, then the user's first prompt, the LLMs reply, the user's next prompt and so-on.
majormajor
8 hours ago
Here, even making that distinction of prompt-as-most-recent-user-input-only, if we use context as how it's generally been defined in "context window" then RAG and such are not then part of the context. They are just things that certain applications might use to enrich the context.
But personally I think a focus on "prompt" that refers to a specific text box in a specific application vs using it to refer to the sum total of the model input increases confusion about what's going on behind the scenes. At least when referring to products built on the OpenAI Chat Completions APIs, which is what I've used the most.
Building a simple dummy chatbot UI is very informative here for de-mystifying things and avoiding misconceptions about the model actually "learning" or having internal "memory" during your conversation. You're just supplying a message history as the model input prompt. It's your job to keep submitting the history - and you're perfectly able to change it if you like (such as rolling up older messages to keep a shorter context window).
dinvlad
14 hours ago
> when the "right" format and "right" time are essentially, and maybe even necessarily, undefined, then aren't you still reaching for a "magic" solution?
Exactly the problem with all "knowing how to use AI correctly" advice out there rn. Shamans with drums, at the end of the day :-)
phyalow
2 hours ago
“non-deterministic machines“
Not correct. They are deterministic as long as a static seed is used.
kazga
2 hours ago
That's not true in practice. Floating point arithmetic is not commutative due to rounding errors, and the parallel operations introduce non-determinisn even at temperature 0.
phyalow
2 hours ago
What? You can get consistent output on local models.
I can train large nets deterministically too (CUBLAS flags). What your saying isn't true in practice. Hell I can also go on the anthropic API right now and get verbatim static results.
simonw
29 minutes ago
"Hell I can also go on the anthropic API right now and get verbatim static results."
How?
Setting temperature to 0 won't guarantee the exact same output for the exact same input, because - as the previous commenter said - floating point arithmetic is non-commutative, which becomes important when you are running parallel operations on GPUs.
andy99
14 hours ago
It's called over-fitting, that's basically what prompt engineering is.
evjan
4 hours ago
That doesn't sound like how I understand over-fitting, but I'm intrigued! How do you mean?
felipeerias
10 hours ago
If someone asked you about the usages of a particular element in a codebase, you would probably give a more accurate answer if you were able to use a code search tool rather than reading every source file from top to bottom.
For that kind of tasks (and there are many of those!), I don't see why you would expect something fundamentally different in the case of LLMs.
skydhash
2 hours ago
But why not provide the search tool instead of being an imperfect interface between it and the person asking? The only reason for the latter is that you have more applied knowledge in the context and can use the tool better. For any other case, the answer should be “use this tool”.
__loam
an hour ago
The uninformed would rather have a natural language interface rather than learn how to actually use the tools.
skydhash
26 minutes ago
The reason for the expert in this case (an uninformed that wants to solve a problem) is that the expert can use metaphors as a bridge for understanding. Just like in most companies, there's the business world (which is heterogeneous) and the software engineering world. A huge part of software engineer's time is spent translating concepts across the two. And the most difficult part of that is asking questions and knowing which question to ask as natural language is so ambiguous.
manishsharan
20 minutes ago
I provided 'grep' as a tool to LLM (deepseek) and it does a better job of finding usages. This is especially true if the code is obfuscated JavaScript.
edwardbernays
15 hours ago
The state of the art theoretical frameworks typically separates these into two distinct exploratory and discovery phases. The first phase, which is exploratory, is best conceptualized as utilizing an atmospheric dispersion device. An easily identifiable marker material, usually a variety of feces, is metaphorically introduced at high velocity. The discovery phase is then conceptualized as analyzing the dispersal patterns of the exploratory phase. These two phases are best summarized, respectively, as "Fuck Around" followed by "Find Out."
autobodie
8 hours ago
Tha problem is that "right" is defined circularly
colordrops
6 hours ago
> Since these are non-deterministic machines, I fail to see any reliable heuristic that is fundamentally indistinguishable than "trying and seeing" with prompts
There are many sciences involving non-determinism that still have laws and patterns, e.g. biology and maybe psychology. It's not all or nothing.
Also, LLMs are deterministic, just not predictable. The non-determinism is injected by providers.
Anyway is there an essential difference between prompt engineering and context engineering? They seem like two names for the same thing.
simonw
an hour ago
They arguably are two names for the same thing.
The difference is that "prompt engineering" as a term has failed, because to a lot of people the inferred definition is "a laughably pretentious term for typing text into a chatbot" - it's become indistinguishable from end-user prompting.
My hope is that "context engineering" better captures the subtle art of building applications on top of LLMs through carefully engineering their context.
pbreit
6 hours ago
What's the difference?
FridgeSeal
13 hours ago
It’s just AI people moving the goalposts now that everyone has realised that “prompt engineering” isn’t a special skill.
coliveira
11 hours ago
In other words, "if AI doesn't work for you the problem is not IA, it is the user", that's what AI companies want us to believe.
shermantanktop
11 hours ago
That’s a good indicator of an ideology at work: no-true-Scotsman deployed at every turn.
j45
9 hours ago
Everything is new to someone and the tends of reference will evolve.
user
15 hours ago
ninetyninenine
8 hours ago
Yeah but do we have to make a new buzz word out of it? "Context engineer"
PeterStuer
6 hours ago
"these are non-deterministic machines"
Only if you choose so by allowing some degree of randomness with the temperature setting.
pegasus
4 hours ago
They are usually nondeterministic even at temperature 0 - due to things like parallelism and floating point rounding errors.
edflsafoiewq
4 hours ago
In the strict sense, sure, but the point is they depend not only on the seed but on seemingly minor variations in the prompt.
zelphirkalt
5 hours ago
This is what irks me so often when reading these comments. This is just software inside a ordinary computer, it always does the same with the same input, which includes hidden and global state. Stating that they are "non-deterministic machines" sounds like throwing the towel and thinking "it's magic!". I am not even sure what people want to actually express, when they make these false statements.
If one wants to make something give the same answers every time, one needs to control all the variables of input. This is like any other software including other machine learning algorithms.
csallen
12 hours ago
This is like telling a soccer player that no change in practice or technique is fundamentally different than another, because ultimately people are non-deterministic machines.