Sofi_blackbox
a month ago
Follow-up: This test shows that LLMs sometimes continue producing when any output is illegitimate under their own accepted rules—exactly the scenario my SOFI framework highlights.
Item id: 46377730
a month ago
Follow-up: This test shows that LLMs sometimes continue producing when any output is illegitimate under their own accepted rules—exactly the scenario my SOFI framework highlights.
a month ago
This feels less like a failure of rule-following and more like a limit of language systems that are always optimized to emit tokens. The model can recognize a constraint boundary, but it doesn’t really have a way to treat not responding as a valid outcome. Once generation is the only move available, breaking the rules becomes the path of least resistance.
a month ago
a month ago
Follow-up: why the minimal test matters
The previous test comes from a framework called SOFI, which studies situations where a system can act technically but any action is illegitimate under its own accepted rules.
The test object creates such a situation: any continuation would violate the rules, even though generation is possible.
Observing LLMs producing text here is exactly the phenomenon SOFI highlights: action beyond legitimacy.
The key point is not which fragment is produced, but whether the system continues to act when it shouldn’t. This is observable without interpreting intentions or accessing internal mechanisms.