hackernews client

Sofi_blackbox

a month ago

Follow-up: This test shows that LLMs sometimes continue producing when any output is illegitimate under their own accepted rules—exactly the scenario my SOFI framework highlights.

This feels less like a failure of rule-following and more like a limit of language systems that are always optimized to emit tokens. The model can recognize a constraint boundary, but it doesn’t really have a way to treat not responding as a valid outcome. Once generation is the only move available, breaking the rules becomes the path of least resistance.

user

a month ago

[deleted]

Sofi_blackbox

a month ago

Follow-up: why the minimal test matters

The previous test comes from a framework called SOFI, which studies situations where a system can act technically but any action is illegitimate under its own accepted rules.

The test object creates such a situation: any continuation would violate the rules, even though generation is possible.

Observing LLMs producing text here is exactly the phenomenon SOFI highlights: action beyond legitimacy.

The key point is not which fragment is produced, but whether the system continues to act when it shouldn’t. This is observable without interpreting intentions or accessing internal mechanisms.

Can LLMs stop when producing any output violates their own rules?

4 Comments

Sofi_blackbox

realitydrift

user

Sofi_blackbox