Observe-only at the OS level is the right design!
You can't trust the agent to report what it actually did. This is part of why I think monolithic agent platforms won't last. Auditing has to be independent of the thing being audited.
I wrote about the layer split happening in agent tooling: https://philippdubach.com/posts/dont-go-monolithic-the-agent...
Very cool idea.
But it’s a pain to review..
I suggest adding Stop & PreCompact hooks to your agent which give it the log and ask it to review its own actions in case it did anything unsafe or unexpected. It can check what it sees against what it remembers and tell you if anything stands out.
Or you could give the transcript and log to a new model and have that one do a review.. but either way the goal is to reduce your cognitive load.
Even cooler is when you notice you can have the model provide recommendations - and make its own plan to incorporate them :)
For an example, here’s what I’m doing with transcripts: https://codeleash.dev/docs/self-reflection