simonw
4 months ago
If you take a look at the system prompt for Claude 3.7 Sonnet on this page you'll see: https://docs.claude.com/en/release-notes/system-prompts#clau...
> If Claude is asked to count words, letters, and characters, it thinks step by step before answering the person. It explicitly counts the words, letters, or characters by assigning a number to each. It only answers the person once it has performed this explicit counting step.
But... if you look at the system prompts on the same page for later models - Claude 4 and upwards - that text is gone.
Which suggests to me that Claude 4 was the first Anthropic model where they didn't feel the need to include that tip in the system prompt.
kristianp
4 months ago
Does that mean they've managed to post train the thinking steps required to get these types of questions correct?
therealpygon
4 months ago
IMO, it’s just a small scale example of “training to the tests” because “count the ‘r’s in strawberry” became such a popular test that would make the news when a powerful model couldn’t answer such a simple question correctly while being advertised as the smartest model ever.
Assigning this as an indicator for improvement of intelligence seems like a mistake (or wishful).
jononor
4 months ago
If done at scale, they are kinda crowd sourcing the test set from the entire internet, personal and business world. It will be harder and harder at least to pinpoint weaknesses, at least for the general public. It probably has little to do with intelligence (at least fluid intelligence as defined by Chollet et al) - but I guess it is sound tactic if the strategy is "fake it till you make it". And we might be surprised as to how far along that can go...
simonw
4 months ago
That's my best guess, yeah.
ivape
4 months ago
Or they’d rather use that context window space for more useful instructions for a variety of other topics.
astrange
4 months ago
Claude's system prompt is still incredibly long and probably hurting its performance.
https://github.com/asgeirtj/system_prompts_leaks/blob/main/A...
jazzyjackson
4 months ago
They ain't called guard rails for nothing! There's a whole world "off-road" but the big names are afraid of letting their superintelligence off the leash. A real shame we're letting brand safety get in the way of performance and creativity, but I guess the first New York Times article about a pervert or terrorist chat bot would doom any big name partnerships.
astrange
4 months ago
Anthropic's entire reason for being is publishing safety papers along the lines of "we told it to say something scary and it said it", so of course they care about this.
ACCount37
4 months ago
I can't stand this myopic thinking.
Do you want to learn "oh, LLMs are capable of scheming, resisting shutdown, seizing control, self-exfiltrating" when it actually happens in a real world deployment, with an LLM capable of actually pulling it off?
If "no", then cherish Anthropic and the work they do.
littlestymaar
4 months ago
You do not appear to understand what an LLM is, I'm afraid.
ACCount37
4 months ago
[flagged]
littlestymaar
4 months ago
> I have a better understanding of "what an LLM is" than you. Low bar.
How many inference engine did you write? Because if the answer is less than two you're going to be disappointed to realize that the bar is higher than you thought.
> that just because LLMs are bad at agentic behavior
It has nothing to do with “agentic behavior”. Thinking that LLM don't currently self-exfiltrate because of “poor agentic behavior” is delusional.
Just because Anthropic managed, by nudging an LLM in the right direction, have an LLM engage in a sci-fi inspired roleplay about escaping doesn't mean that LLMs are evil geniuses wanting to jump out of the bottle. This is pure fear mongering and I'm always saddened that there are otherwise intelligent people who buy their bullshit.
e1g
4 months ago
Do you happen to have a link with a more nuanced technical analysis of that (emergent) behavior? I’ve read only the pop-news version of that “escaping” story.
ACCount37
4 months ago
There is none. We don't understand LLMs well enough to be able to conduct a full fault analysis like this.
We can't trace the thoughts of an LLM the way we can trace code execution - the best mechanistic interpretability has to offer is being able to get glimpses occasionally. The reasoning traces help, but they're still incomplete.
Is it pattern-matching? Is it acting on its own internal goals? Is it acting out fictional tropes? Were the circumstances of the test scenarios intentionally designed to be extreme? Would this behavior have happened in a real world deployment, under the right circumstances?
The answer is "yes", to all of the above. LLMs are like that.
fragmede
4 months ago
You might have missed the appendix the Anthropic blog post linked to, which has additional detail.
https://www.anthropic.com/research/agentic-misalignment
https://assets.anthropic.com/m/6d46dac66e1a132a/original/Age...
ngruhn
4 months ago
Why would they have an interest in "fear mongering"? For any other product/technology the financial incentive is usually to play down any risks.
bakugo
4 months ago
In addition to the whole anti-competitive aspect already mentioned, it also helps sell the idea that LLMs are more powerful and capable of more things than they actually are.
They want clueless investors to legitimately believe that these futuristic AIs are advanced enough that they could magically break out of our computers and take over the world terminator-style if not properly controlled, and totally aren't just glorified text completion algorithms.
littlestymaar
4 months ago
Not if you want the regulators to stop new entrants on the market for “safety reasons” which have been Dario Amodei's playbook for the past two years now.
He acts as if he believed the only way to avoid the commoditization of its business by open weight models is to manage to get a federal ban on them for being a national security threat.
ACCount37
4 months ago
And I'm disappointed that people capable of writing an inference engine seem incapable of grasping of just how precarious the current situation is.
There's by now a small pile of studies that demonstrate: in hand-crafted extreme scenarios, LLMs are very capable of attempting extreme things. The difference between that and an LLM doing extreme things in a real deployment with actual real life consequences? Mainly, how capable that LLM is. Because life is life and extreme scenarios will happen naturally.
The capabilities of LLMs are what holds them back from succeeding at this kind of behavior. The capabilities of LLMs keep improving, as technology tends to.
And don't give me any of that "just writing text" shit. The more capable LLMs get, the more access they'll have as a default. People already push code written by LLMs to prod and give LLMs root shells.
curioussquirrel
4 months ago
Thanks, Simon! I saw the same approach (numbering the individual characters) in GPT 4.1's answer, but not anymore in GPT 5's. It would be an interesting convergence if the models from Anthropic and OpenAI learned to do this at a similar time, especially given they're (reportedly) very different architecturally.
hansmayer
4 months ago
Not trying to be cynical here, but I am genuinely interested is there a reason why these LLM don't/can't/won't apply some deterministic algorithm? I mean, counting characters and such, we have solved those problems ages ago.
simonw
4 months ago
They can. ChatGPT has been able to count characters/words etc flawlessly for a couple of years now if you tell it to "use your Python tool".
hansmayer
4 months ago
Fair enough. But why do I have to tell them that, should they not be able to figure it out themselves? If I show a 5-year kid once how to use colour pencils, I won't have to show them each time they want to make a drawing. This is the core weakness of the LLMs - you have to micromanage them so much, that it runs counter to the core promise that is being pushed since 3+ years now.
scrollaway
4 months ago
If I ask you to count the r’s in strawberry, do you whip out your Python tool?
hansmayer
4 months ago
That depends on the context, obviously. If you had asked me to count them in every "strawberry" in a text file, then I may whip out my Python or some combination of bash, awk and sed. If you asked me in a conversation, I may close my eyes, visualise the string and use my visual cortext tool to count them in-memory. If you gave me a piece of paper with the word on it, I may use my 'eye' or 'finger' tool to count them. There are numerous approaches, based on the problem setting as you see, but one thing in common - you don't need to specifically tell me what tool to use. I will infer it myself, based on the context. Something an LLM almost never does.
curioussquirrel
4 months ago
This is a very good answer and I'm commenting only to bring more attention to it apart from voting up. Well put!
Lerc
4 months ago
Specifically for simple character level questions, if LLMs did that automatically, we would be inundated with stories about "AI model caught cheating"
They are stuck in a place where the models are expected to do two things simultaneously. People want them to show the peak of pure AI ability while at the same time be the most useful they can be.
Err too much on the side automatic use of tools and people will claim you're just faking it, fail to use tools sufficiently and people will claim that the AI is incapable of operations that any regular algorithm could do.
hansmayer
4 months ago
Are you sure? Isn´t one aspect of intelligence being able to use, apply and develop tools? Isnt that the core feature that got humanity ahead of other mammals? As an early adopter, I couldn´t have cared less if AI was cheating in terms of strictly academic terms. I care about results. Lets say we´re working on something together and I ask you what is the 123921 multiplied by 1212. As the most natural thing you will dish out your calculator and give me the result. Do I care how you reached it? No, so as long as the result is correct, reliable, repeatable and quick - AND - I did not specifically ask you to perform the calculation specifically by hand or only with your mental faculties. So this is missing from those tools and because we have to remember to tell them for each and every use case HOW to do it, they are not intelligent.
simonw
4 months ago
If you care enough about this you can stick a note in your own custom instructions about it.
If you allow ChatGPT to use its memory feature (I deliberately turn that off) and ask those kinds of questions enough it might even make a note about this itself.
hansmayer
4 months ago
Yeah that sounds obvious, but unfortunately my experience does not align with this (and I've heard from others similar). I am not using ChatGPT, but another tool within an IDE. I was excited about custom or "default" instructions, until it turned out they work maybe 50% of the time. So you end up repeating "make sure to include .github/custom.md" which is effectively the same crap. So we got ourselves a tool which adds to our cognitive load, great :)
simonw
4 months ago
Which tool and which model? Those make a significant difference here.
hansmayer
4 months ago
Well, for such a trivial feature, e.g. loading user settings it actually should not matter, as this too, is a problem we solved decades ago in many deterministic ways. But if it does, then we have an extremely fragile technology being promised as solution to all the humanitys problems. The tool we use is Github Copilot with the entire model offering. Out of which we mostly use Claude Sonnet 4. Since over the last several months they started entshittifying it though, as you are probably aware, we reverted from agent mode to mainly using it just as an annoying and verbose replacement for the entshittifed google search.
dan-robertson
4 months ago
I think the intuition is that they don’t ‘know’ that they are bad at counting characters and such, so they answer the same way they answer most questions.
hansmayer
4 months ago
Well, they can be made to use custom tools for writing to files and such, so I am not sure if that is the real reason? I have a feeling it is more because of trying to make this an "everything technology".
kingkongjaffa
4 months ago
I suppose the codewriting tools could also just write code to do this job if prompted