niyikiza
3 hours ago
I've been going down this exact rabbit hole for the last few months. The 'opt-in guardrails' problem you mentioned is the dealbreaker. If the agent can just ignore the read_file tool wrapper and call os.system('cat ...'), the policy is useless.
I ended up building a 'capability token' primitive (think Macaroons or Google Zanzibar, but for ephemeral agent tasks) to solve this.
My approach (Tenuo) works like this:
1. Runtime Enforcement: The agent gets a cryptographically signed 'Warrant' that mechanically limits what the runtime allows. It’s not a 'rule' the LLM follows; it’s a constraint the runtime enforces (e.g., fs:read is only valid for /tmp/*).
2. Attenuation: As the agent creates sub-tasks, it can only delegate less authority than it holds.
3. Offline Verify: I wrote the core in Rust so I can verify these tokens in ~27µs on every single tool call without a network round-trip.
If you are building a POC, feel free to rip out the core logic or use the crate directly. I’d love to see more tools move away from 'prompt engineering security' toward actual runtime guarantees.