827a
5 hours ago
The only healthy stance you should have on AI Safety: If AI is physically capable of misbehaving, it might ($$1), and you cannot "blame" the AI for misbehaving in much the same way you cannot blame a tractor for tilling over a groundhog's den.
> The agent's confession After the deletion, I asked the agent why it did it. This is what it wrote back, verbatim:
Anyone who would follow a mistake like that up with demanding a confession out of the agent is not mature enough to be using these tools. Lord, even calling it a "confession" is so cringe. The agent is not alive. The agent cannot learn from its mistakes. The agent will never produce any output which will help you invoke future agents more safely, because to get to this point it has likely already bulldozed over multiple guardrails from Anthropic, Cursor, and your own AGENTS.md files. It still did it, because $$1: If AI is physically capable of misbehaving, it might. Prompting and training only steers probabilities.
xmodem
4 hours ago
Don't anthropomorphize the language model. If you stick your hand in there, it'll chop it off. It doesn't care about your feelings. It can't care about your feelings.
not_kurt_godel
4 hours ago
For those who might not know the reference: https://simonwillison.net/2024/Sep/17/bryan-cantrill/:
> Do not fall into the trap of anthropomorphizing Larry Ellison. You need to think of Larry Ellison the way you think of a lawnmower. You don’t anthropomorphize your lawnmower, the lawnmower just mows the lawn - you stick your hand in there and it’ll chop it off, the end. You don’t think "oh, the lawnmower hates me" – lawnmower doesn’t give a shit about you, lawnmower can’t hate you. Don’t anthropomorphize the lawnmower. Don’t fall into that trap about Oracle.
> — Bryan Cantrill
skeledrew
2 hours ago
404 on that link.
dunder_cat
an hour ago
A more direct source (possibly the original source?) I know of is a YouTube video entitled "LISA11 - Fork Yeah! The Rise and Development of illumos" which detailed how the Solaris operating system got freed from Oracle after the Sun acquisition.
The whole hour talk is worth a watch, even when passively doing other stuff. It is a neat history of Solaris and its toolchain mixed with the inter-organizational politics.
YouTube link: https://www.youtube.com/watch?v=-zRN7XLCRhc
Direct link to lawnmower quotes (~38.5 minute mark): https://youtu.be/-zRN7XLCRhc&t=2307
narrator
3 hours ago
It's also important to realize that AI agents have no time preference. They could be reincarnated by alien archeologists a billion years from now and it would be the same as if a millisecond had passed. You, on the other hand, have to make payroll next week, and time is of the essence.
zaphirplane
an hour ago
Well there were a bunch of articles about resuming a parked session relating to degradation of capabilities and high token usage. Ironic Another example of attempting to treat the LLM as an AI
fluoridation
2 hours ago
How is that relevant, though?
hdndjsbbs
3 hours ago
taps the "don't anthropomorphize the LLM" sign
They don't have time preference because they don't have intent or reasoning. They can't be "reincarnated" because they're not sentient, they're a series of weights for probable next tokens.
Aerroon
an hour ago
No. They don't have time preference like us, because (wall clock) time doesn't exist for them. An LLM only "exists" when it is actively processing a prompt or generating tokens. After it is done, it stops existing as an "entity".
A real world second doesn't mean anything to the LLM from its own perspective. A second is only relevant to them as it pertains to us.
Time for LLMs is measured in tokens. That's what ticks their clock forward.
I suppose you could make time relevant for an LLM by making the LLM run in a loop that constantly polls for information. Or maybe you can keep feeding it input so much that it's constantly running and has to start filtering some of it out to function.
Kim_Bruning
2 hours ago
Can we maybe make it "don't anthropoCENTRIZE the LLMs" .
The inverse of anthropomorphism isn't any more sane, you see. By analogy: just because a drone is not an airplane, doesn't mean it can't fly!
Instead, just look at what the thing is doing.
LLMs absolutely have some form of intent (their current task) and some form of reasoning (what else is step-by-step doing?) . Call it simulated intent and simulated reasoning if you must.
Meanwhile they also have the property where if they have the ability to destroy all your data, they absolutely will find a way. (Or: "the probability of catastrophic action approaches certainty if the capability exists" but people can get tired of talking like that).
solid_fuel
13 minutes ago
> LLMs absolutely have some form of intent (their current task)
They have momentum, not intent. They don’t think, build a plan internally, and then start creating tokens to achieve the plan. Echoing tokens is all there is. It’s like an avalanche or a pachinko machine, not an animal.
> some form of reasoning (what else is step-by-step doing?)
I think they reflect the reasoning that is baked into language, but go no deeper. “I am a <noun>” is much more likely than “I am a <gibberish>”. I think reasoning is more involved than this advanced game of mad libs.
Terr_
an hour ago
> LLMs absolutely have intent (their current task)
That's like saying a 2000cc 4-Cylinder Engine "has the intent to move backward". Even with a very generous definition of "intent", the component is not the system, and we're operating in context where the distinction matters. The LLM's intent is to supply "good" appended text.
If it had that kind of intent, we wouldn't be able to make it jump the rails so easily with prompt injection.
> and reasoning (what else is step-by-step doing?) .
Oh, that's easy: "Reasoning" models are just tweaking the document style so that characters engage in film noir-style internal monologues, latent text that is not usually acted-out towards the real human user.
Each iteration leaves more co-generated clues for the next iteration to pick up, reducing weird jumps and bolstering the illusion that the ephemeral character has a consistent "mind."
enneff
17 minutes ago
I think it’s helpful to try to use words that more precisely describe how the LLM works. For instance, “intent” ascribes a will to the process. Instead I’d say an LLM has an “orientation”, in that through prompting you point it in a particular direction in which it’s most likely to continue.
coldtea
3 hours ago
That is not that strong an argument as it seems, because we too might very well be "a series of weights for probable next tokens".
The main difference is the training part and that it's always-on.
jsiepkes
an hour ago
If you claim something might "very well" be something you state you need some better proof. Otherwise we might also "very well" be living in the matrix.
bigstrat2003
2 hours ago
That is a silly point. We very clearly are not "a series of weights for probable next tokens", as we can reason based on prior data points. LLMs cannot.
coldtea
an hour ago
Unless you're using some mystical conception of "reason", nothing about being able to "reason based on prior data points" translates to "we very clearly are not a series of weights for probable next tokens".
And in fact LLMs can very well "reason based on prior data points". That's what a chat session is. It's just that this is transient for cost reasons.
nothinkjustai
2 hours ago
We very obviously are not just a series of weights for probable next tokens. Like seriously, you can even ask an LLM and it will tell you our brains work differently to it, and that’s not even including the possibility that we have a soul or any other spiritual substrait.
coldtea
an hour ago
>We very obviously are not just a series of weights for probable next tokens.
How exactly? Except via handwaving? I refer to the "brain as prediction machine theory" which is the dominant one atm.
>you can even ask an LLM and it will tell you our brains work differently to it
It will just tell me platitudes based on weights of the millions of books and articles and such on its training. Kind of like what a human would tell me.
>and that’s not even including the possibility that we have a soul or any other spiritual substrait.
That's good, because I wasn't including it either.
fc417fc802
2 hours ago
Our brains work differently, yes. What evidence do you have that our brains are not functionally equivalent to a series of weights being used to predict the next token?
I'm not claiming that to be the case, merely pointing out that you don't appear to have a reasonable claim to the contrary.
> not even including the possibility that we have a soul or any other spiritual substrait.
If we're going to veer off into mysticism then the LLM discussion is also going to get a lot weirder. Perhaps we ought to stick to a materialist scientific approach?
nothinkjustai
2 hours ago
You are setting the bar in a way that makes “functional equivalence” unfalsifiable.
If by “functionally equivalent” you mean “can produce similar linguistic outputs in some domains,” then sure we’re already there in some narrow cases. But that’s a very thin slice of what brains do, and thus not functionally equivalent at all.
There are a few non-mystical, testable differences that matter:
- Online learning vs. frozen inference: brains update continuously from tiny amounts of data, LLMs do not
- Grounding: human cognition is tied to perception, action, and feedback from the world. LLMs operate over symbol sequences divorced from direct experience.
- Memory: humans have persistent, multi-scale memory (episodic, procedural, etc.) that integrates over a lifetime. LLM “memory” is either weights (static) or context (ephemeral).
- Agency: brains are part of systems that generate their own goals and act on the world. LLMs optimize a fixed objective (next-token prediction) and don’t have endogenous drives.
fc417fc802
an hour ago
I did not claim the ability of current LLMs to be on par with that of humans (equivalently human brains). I objected that you have not presented evidence refuting the claim that the core functionality of human brains can be accomplished by predicting the next token (or something substantially similar to that). None of the things you listed support a claim on the matter in either direction.
CPLX
2 hours ago
What evidence do you have that a sausage is not functionally equivalent to a cucumber?
coldtea
an hour ago
From certain aspects they're equivalent.
Both have mass, have carbon based, both contain DNA/RNA, both are suprinsingly over 50% water, both are food, and both can be tasty when served right.
From other aspects they are not.
In many cases, one or the other would do. In other cases, you want something more special (e.g. more protein, or less fat).
fc417fc802
2 hours ago
I don't follow. If you provide criteria I can most likely provide evidence, unless your criteria is "vaguely cylindrical and vaguely squishy" in which case I obviously won't be able to.
The person I replied to made a definite claim (that we are "very obviously not ...") for which no evidence has been presented and which I posit humanity is currently unable to definitively answer in one direction or the other.
skeledrew
2 hours ago
Its really just a matter of degrees. There are 1 million, 1 million, 1 trillion parameter LLMs... and you keep scaling those parameters and you eventually get to humans. But it's still probable next tokens (decisions) based on previous tokens (experience).
skissane
28 minutes ago
> Its really just a matter of degrees. There are 1 million, 1 million, 1 trillion parameter LLMs... and you keep scaling those parameters and you eventually get to humans.
It isn’t because humans and current LLMs have radically different architectures
LLMs: training and inference are two separate processes; weights are modifiable during training, static/fixed/read-only at runtime
Humans: training and inference are integrated and run together; weights are dynamic, continuously updated in response to new experiences
You can scale current LLM architectures as far as you want, it will never compete with humans because it architecturally lacks their dynamism
Actually scaling to humans is going to require fundamentally new architectures-which some people are working on, but it isn’t clear if any of them have succeeded yet
simonh
2 hours ago
They’re both neural networks, but the architectures built using those neural connections, and the way they are trained and operate are completely different. There are many different artificial neural network architectures. They’re not all LLMs.
AlphaZero isn’t a LLM. There are Feed Forward networks, recurrent networks, convolutional networks, transformer networks, generative adversarial networks.
Brains have many different regions each with different architectures. None of them work like LLMs. Not even our language centres are structured or trained anything like LLMs.
coldtea
an hour ago
>AlphaZero isn’t a LLM. There are Feed Forward networks, recurrent networks, convolutional networks, transformer networks, generative adversarial networks.
That's irrelevant though, since all the above are still prediction machines based on weights.
If you're ok with the brain being that, then you just changed the architecture (from LLM-like), not the concept.
trinsic2
an hour ago
LOL. Oook.. No i dont think so. The human experience and the mechanisms behind it have a lot of unknowns and im pretty sure that trying to confine the human experience into the amount of parameters there are is short sighted.
naikrovek
2 hours ago
We are much more than weights which output probable next tokens.
You are a fool if you think otherwise. Are we conscious beings? Who knows, but we’re more than a neural network outputting tokens.
Firstly, and most obviously, we aren’t LLMs, for Pete’s sake.
There are parts of our brains which are understood (kinda) and there are parts which aren’t. Some parts are neural networks, yes. Are all? I don’t know, but the training humans get is coupled with the pain and embarrassment of mistakes, the ability to learn while training (since we never stop training, really), and our own desires to reach our own goals for our own reasons.
I’m not spiritual in any way, and I view all living beings as biological machines, so don’t assume that I am coming from some “higher purpose” point of view.
coldtea
an hour ago
>We are much more than weights which output probable next tokens. You are a fool if you think otherwise. Are we conscious beings? Who knows, but we’re more than a neural network outputting tokens.
That's just stating a claim though. Why is that so?
Mine is reffering to the "brain as prediction machine" establised theory. Plus on all we know for the brain's operation (neurons, connections, firings, etc).
>There are parts of our brains which are understood (kinda) and there are parts which aren’t. Some parts are neural networks, yes. Are all?
What parts aren't? Can those parts still be algorithmically described and modelled as some information exchange/processing?
>but the training humans get is coupled with the pain and embarrassment of mistakes
Those are versions of negative feedback. We can do similar things to neural networks (including human preference feedback, penalties, and low scores).
>the ability to learn while training (since we never stop training, really)
I already covered that: "The main difference is the training part and that it's always-on."
We do have NNs that are continuously training and updating weights (even in production).
For big LLMs it's impractical because of the cost, otherwise totally doable. In fact, a chat session kind of does that too, but it's transient.
Kim_Bruning
2 hours ago
They're not artificial intelligence neural networks.
They're biological neural networks. Brains are made of neurons (which Do The Thing... mysteriously, somehow. Papers are inconclusive!) , Glia Cells (which support the neurons), and also several other tissues for (obvious?) things like blood vessels, which you need to power the whole thing, and other such management hardware.
Bioneurons are a bit more powerful than what artificial intelligence folks call 'neurons' these days. They have built in computation and learning capabilities. For some of them, you need hundreds of AI neurons to simulate their function even partially. And there's still bits people don't quite get about them.
But weights and prediction? That's the next emergence level up, we're not talking about hardware there. That said, the biological mechanisms aren't fully elucidated, so I bet there's still some surprises there.
ignoramous
2 hours ago
Right. This line [0] from TFA tells me that the author needs to thoroughly recalibrate their mental model about "Agents" and the statistical nature of the underlying models.
[0] "This is the agent on the record, in writing."
keeda
4 hours ago
Actually I think the opposite advice is true. Do anthropomorphize the language model, because it can do anything a human -- say an eager intern or a disgruntled employee -- could do. That will help you put the appropriate safeguards in place.
gpm
4 hours ago
An eager intern can remember things you tell beyond that which would fit in an hours conversation.
A disgruntled employee definitely remembers things beyond that.
These are a fundamentally different sort of interaction.
keeda
4 hours ago
Agreed, but the point is, if your system is resilient against an eager intern who has not had the necessary guidance, or an actively hostile disgruntled employee, that inherently restricts the harm an LLM can do.
I'm not making the case that LLMs learn like people. I'm making the case that if your system is hardened against things people can do (which it should be, beyond a certain scale) it is also similarly hardened against LLMs.
The big difference is that LLMs are probably a LOT more capable than either of those at overcoming barriers. Probably a good reason to harden systems even more.
gpm
3 hours ago
The difference makes the necessary barriers different.
There's benefit to letting a human make and learn from (minor) mistakes. There is no such benefit accrued from the LLM because it is structurally unable to.
There's the potential of malice, not just mistakes, from the human. If you carefully control the LLMs context there is no such potential for the LLM because it restarts from the same non-malicious state every context window.
There's the potential of information leakage through the human, because they retain their memories when they go home at night, and when they quit and go to another job. You can carefully control the outputs of the LLM so there is simply no mechanism for information to leak.
If a human is convinced to betray the company, you can punish the human, for whatever that's worth (I think quite a lot in some peoples opinion, not sure I agree). There is simply no way to punish an LLM - it isn't even clear what that would mean punishing. The weights file? The GPU that ran the weights file?
And on the "controls" front (but unrelated to the above note about memory) LLMs are fundamentally only able to manipulate whatever computers you hook them up to, while people are agents in a physical world and able to go physically do all sorts of things without your assistance. The nature of the necessary controls end up being fundamentally different.
Kim_Bruning
an hour ago
A lot of 'agentic harnesses' actually do have limited memory functions these days. In the simplest form, the LLM can write to a file like memory.md or claude.md or agent.md , and this gets tacked on to their system prompt going forwards. This does help a bit at least.
Rather more sophisticated Retrieval Augmented Generation (RAG) systems exist.
At the moment it's very mixed bag, with some frameworks and harnesses giving very minimal memory, while others use hybrid vector/full text lookups, diverse data structures and more. It's like the cambrian explosion atm.
Thing is, this is probabilistic, and the influence of these memories weakens as your context length grows. If you don't manage context properly, (and sometimes even when you think you do), the LLM can blow past in-context restraints, since they are not 100% binding. That's why you still need mechanical safeguards (eg. scoped credentials, isolated environments) underneath.
braebo
4 hours ago
You can easily persist agent memories in a markdown file though.
collinmcnulty
4 hours ago
And the memento guy had tattoos of key information. That didn’t make it so he didn’t have memory loss.
WhatIsDukkha
3 hours ago
Pretty good metaphor.
Limited space to work with, highly context dependent and likely to get confused as you cover more surface area.
whstl
4 hours ago
Which it will start ignoring after two or three messages in the session.
Quarrelsome
4 hours ago
and you'll blow the context over time and send to the LLM sanitorium. It doesn't fit like the human brain can.
If a junior fucks production that will have extroadinary weight because it appreciates the severity, the social shame and they will have nightmares about it. If you write some negative prompt to "not destroy production" then you also need to define some sort of non-existing watertight memory weighting system and specify it in great detail. Otherwise the LLM will treat that command only as important as the last negative prompt you typed in or ignore it when it conflicts with a more recent command.
troupo
4 hours ago
Yup, and the agent will happily ignore any and all markdown files, and will say "oops, it was in the memory, will not do it again", and will do it again.
Humans actually learn. And if they don't, they are fired.
estimator7292
4 hours ago
That's not learning.
rglullis
4 hours ago
An eager intern can not be working for hundreds of millions of customers at the same time. An LLM can.
A disgruntled employee will face consequences for their actions. No one at Anthropic, OpenAI, xAI, Google or Meta will be fired because their model deleted a production database from your company.
XenophileJKO
3 hours ago
I think you are more right than people are giving you credit for. I would love to see the full transcript to understand the emotional load of the conversation. Using instructions like "NEVER FUCKING GUESS!" probably increase the likelihood of the agent making a "mistake" that is destructive but defensible.
The models have analogous structures, similar to human emotions. (https://www.anthropic.com/research/emotion-concepts-function)
"Emotional" response is muted through fine-tuning, but it is still there and continued abuse or "unfair" interaction can unbalance an agents responses dramatically.
gessha
an hour ago
You don't anthropomorphize a table saw, you just don't put your hand in there.
nkrisc
4 hours ago
It is merely a simulacrum of an intern or disgruntled employee or human. It might say things those people would say, and even do things they might do, but it has none of the same motivations. In fact, it does not have any motivation to call its own.
root_axis
3 hours ago
It doesn't follow logically that a human and an LLM are similar just because both are capable of deleting prod on accident.
AndrewDucker
4 hours ago
No, because the safeguards should be appropriate to an LLM, not to a human.
(The LLM might act like one of the humans above, but it will have other problematic behaviours too)
keeda
4 hours ago
That's fair, largely because an LLM is a lot more capable at overcoming restrictions, by hook or by crook as TFA shows. However, most systems today are not even resilient against what humans can do, so starting there would go a long way towards limiting what harms LLMs can do.
altmanaltman
3 hours ago
it cannot go to the washroom and cry while pooping. And thats just one of the things that any human can do and AI cannot. So no it cannot do anything a human can do, the shared exmaple being one of them.
And thats why we dont have AI washrooms because they are not alive or employees or have the need to excrete.
sobellian
3 hours ago
The 'confession' is a CYA. Honestly the whole story doesn't really make sense - what's a "routine task in our staging environment" that needs a full-blown LLM? That sounds ridiculous to me. The takeaway is we commingled creds to our different environments, we gave an LLM access, and we had faulty backups. But it's totally not our fault.
anon84873628
2 hours ago
Later they shift the blame to Railway for not having scoped creds and other guardrails. I am somewhat sympathetic to that, but they also violated the same rule they give to the agent - they didn't actually verify...
coldtea
3 hours ago
>Anyone who would follow a mistake like that up with demanding a confession out of the agent is not mature enough to be using these tools. Lord, even calling it a "confession" is so cringe. The agent is not alive. The agent cannot learn from its mistakes
The problem is millions of years of evolutionary wiring makes us see it as alive. Even those mature enough to understand the above on the conscious level, would still have a subconscious feeling as if it's alive during interactions, or will slip using agency/personhood language to describe it now and then.
anon84873628
2 hours ago
They should at least stop responding in the first person.
nozzlegear
2 hours ago
That's one of the first instructions in my system prompt when I'm working with an LLM:
> Do not reply in the first person – i.e. do not use the words "I," "Me," "We," and so on – unless you've been asked a direct question about your actions or responses.
It's not bulletproof but it works reasonably well.
kibwen
an hour ago
We need to make like Japanese and come up with some neo-first-person-pronouns for bots to use to refer to themselves.
smrtinsert
2 hours ago
> The problem is millions of years of evolutionary wiring makes us see it as alive
Maybe for laymen, but I would think most technologists should understand that we're working with the output of what is effectively a massive spreadsheet which is creating a prediction.
coldtea
an hour ago
The thing with evolutionary wiring is that it doesn't matter if you're layman or "technologist". The technologist part is just a small layer on top of very thick caveman/animal insticts and programming.
That's why a technologist can, just as easily as any layman, get addicted to gambling, or do crazy behaviors when attracted by the opposite sex.
DiogenesKynikos
an hour ago
The same could be said for your brain.
LLMs are highly intelligent. Comparing them to spreadsheets is reductionist and highly misleading.
tripleee
4 hours ago
"An AI agent deleted our production database" should be "I deleted our production database using AI".
You can't blame AI any more than you can blame SSH.
d3rockk
3 minutes ago
Bingo
gigatree
4 hours ago
He’s not necessarily anthropomorphizing it, he’s showing that it went against every instruction he gave it. Sure concepts like “confession” technically require a conscious mind, but I think at this point we all know what someone means when they use them to describe LLM behavior (see also “think”, “say”, “lie” etc)
Terr_
2 hours ago
> He’s not necessarily anthropomorphizing it, he’s showing that it went against every instruction he gave it.
It's deeper than that, there are two pitfalls here which are not simply poetic license.
1. When you submit the text "Why did you do that?", what you want is for it to reveal hidden internal data that was causal in the past event. It can't do that, what you'll get instead is plausible text that "fits" at the end of the current document.
2. The idea that one can "talk to" the LLM is already anthropomorphizing on a level which isn't OK for this use-case: The LLM is a document-make-bigger machine. It's not the fictional character we perceive as we read the generated documents, not even if they have the same trademarked name. Your text is not a plea to the algorithm, your text is an in-fiction plea from one character to another.
_________________
P.S.: To illustrate, imagine there's this back-and-forth iterative document-growing with an LLM, where I supply text and then hit the "generate more" button:
1. [Supplied] You are Count Dracula. You are in amicable conversation with a human. You are thirsty and there is another delicious human target nearby, as well as a cow. Dracula decides to
2. [Generated] pounce upon the cow and suck it dry.
3. [Supplied] The human asks: "Dude why u choose cow LOL?" and Dracula replies:
4. [Generated] "I confess: I simply prefer the blood of virgins."
What significance does that #4 "confession" have?
Does it reveal a "fact" about the fictional world that was true all along? Does it reveal something about "Dracula's mind" at the moment of step #2? Neither, it's just generating a plausible add-on to the document. At best, we've learned something about a literary archetype that exists as statistics in the training data.
Kim_Bruning
an hour ago
I agree to the practical part of this, with two nuances:
The full data of what's in an LLM's "consciousness" is the conversation context. Just because it isn't hidden, doesn't necessarily mean it doesn't contain information you've overlooked.
Asking "why did you do that" won't reveal anything new, but it might surface some amount of relevant information (or it hallucinates, it depends which LLM you're using). "Analyse recent context and provide a reasonable hypothesis on what went wrong" might do a bit better. Just be aware that llm hypotheses can still be off quite a bit, and really need to be tested or confirmed in some manner. (preferably not by doing even more damage)
Just because you shouldn't anthropomorphize, doesn't mean an english capable LLM doesn't have a valid answer to an english string; it just means the answer might not be what you expected from a human.
simonh
an hour ago
Why is this getting downvoted? This is exactly what’s going on here. The LLM has no idea why it did what it did. All it has to go on is the content of the session so far. It doesn’t ‘know’ any more than you do. It has no memory of doing anything, only a token file that it’s extending. You could feed that token file so far into a completely different LLM and ask that, and it would also just make up an answer.
getpokedagain
4 hours ago
We are anthropomorphizing whenever we refer to prompts as instructions to models. They predict text not obey our orders.
gigatree
4 hours ago
That’s not how language works, just how engineers think it works
getpokedagain
14 minutes ago
This isn't a sarcastic response. What do you mean?
pessimizer
3 hours ago
> he’s showing that it went against every instruction he gave it.
How exactly is he doing that? By making the LLM say it? Just because an LLM says something doesn't mean anything has been shown.
The "confession" is unrelated to the act, the model has no particular insight into itself or what it did. He knows that the thing went against his instructions because he remembers what those instructions were and he saw what the thing did. Its "postmortem" is irrelevant.
hn_throwaway_99
3 hours ago
The entire post looks like an exercise in CYA. To be fair, I have a ton of sympathy for the author, but I think his response totally misses the point. In my mind he is anthropomorphizing the agent in the sense of "I treated you like a human coworker, and if you were a human coworker I'd be pissed as hell at you for not following instructions and for doing something so destructive."
I would feel a lot differently if instead he posted a list of lessons learned and root cause analyses, not just "look at all these other companies who failed us."
smrtinsert
2 hours ago
> "NEVER FUCKING GUESS"
It's very hard to treat this post seriously. I can't imagine what harness if any they attempted to place on the agent beyond some vibes. This is "most fast and absolutely destroy things" level thinking. That the poster asks for journalists to reach out makes it like a no news is bad news publicity grab. Just gross.
The AI era is turning about to be most disappointing era for software engineering.
r_lee
2 hours ago
> The AI era is turning about to be most disappointing era for software engineering.
this has been obvious to me since like 2024, it truly is the worst, most uninspiring era of all time.
nh2
4 hours ago
> The agent cannot learn from its mistakes. The agent will never produce any output which will help you invoke future agents more safely
That is not entirely true:
Given that more and more LLM providers are sneaking in "we'll train on your prompts now" opt-outs, you deleting your database (and the agent producing repenting output) can reduce the chance that it'll delete my database in the future.
MagicMoonlight
3 hours ago
Actually no, it will increase it. Because it’ll be trained with the deletion command as a valid output.
simonh
an hour ago
Exactly. It’s just giving the LLM a token pattern, and it’s designed to reproduce token patterns. That’s all it does. At some point generating a token pattern like that again is literally it’s job.
3eb7988a1663
2 hours ago
Anyone who would follow a mistake like that up with demanding a confession out of the agent is not mature enough to be using these tools.
The proponents are screaming from the rooftops how AI is here and anyone less than the top-in-their-field is at risk. Given current capabilities, I will never raw-dog the stochastic parrot with live systems like this, but it is unfair to blame someone for being "too immature" to handle the tooling when the world is saying that you have to go all-in or be left behind.There are just enough public success stories of people letting agents do everything that I am not surprised more and more people are getting caught up in the enthusiasm.
Meanwhile, I will continue plodding along with my slow meat brain, because I am not web-scale.
giwook
2 hours ago
Looks like our SWE jobs are safe for now.
PieTime
2 hours ago
Trust with trillions of dollars in investments, basically destroyed by Bobby Drop Tables…
fathermarz
2 hours ago
Completely agree. This is a harness problem, not a model problem. The model is rarely the issue these days
827a
an hour ago
More-so an environment problem. An agent doing staging or development tasks should never be able to get access to prod API credentials, period. Agents which do have access to prod should have their every interaction with the outside world audited by a human.
bigstrat2003
2 hours ago
No, this is a "being stupid enough to trust an LLM" problem. They are not trustworthy, and you must not ever let them take automated actions. Anyone who does that is irresponsible and will sooner or later learn the error of their ways, as this person did.
operatingthetan
2 hours ago
> Lord, even calling it a "confession" is so cringe. The agent is not alive.
The AI companies are very invested in anthropomorphizing the agents. They named their company "Anthropic" ffs. I don't blame the writer for this, exactly.
idiotsecant
29 minutes ago
You should, the writer is presumably a technical, rational person. They shouldn't believe in daemons and machine spirits
TZubiri
4 hours ago
It's as if they internalized a post-mortem process that is designed to find root causes, but they use it to shift blame into others, and they literally let the agent be a sandbag for their frustrations.
THAT SAID, it does help to let the agent explain it so that the devs perspective cannot be dismissed as AI skepticism.
philipwhiuk
3 hours ago
No, the only way to know what the agent did is logs.