gemini2026
6 hours ago
AI agents (Claude Code, Cline, Aider, OpenClaw) execute real side effects — writing
files, running shell commands, making network requests. Most security approaches
evaluate each action in isolation against a blocklist. That misses the pattern that
actually matters.
Gatekeeper tracks behavioural state across the entire session. If an agent reads
credentials, then ingests content from an untrusted source, and then attempts a network
call — that combination triggers escalation to human review, even if each individual
The action would normally be allowed. We call it the exfiltration trifecta:
read_sensitive + ingested_untrusted + has_egress.
OpenClaw is the tightest integration: Gatekeeper launches it as a managed child
process inside an OS-native sandbox (macOS sandbox-exec, Linux unshare), generates
its config automatically, and intercepts every tool call before it executes. One
command: `gatekeeper run --agent openclaw --workspace /path/to/project`.
Other things it does:
- Policy-as-code: YAML rulepacks signed with Ed25519 (tamper-evident, auditable)
- Approval flow: ASK decisions pause execution and wait for human approval in a UI
- Append-only audit log with SHA-256 hash chain
- Prompt injection scanner on tool call inputs/outputs (16 patterns, NFKC normalized)
- Agent identity guard: blocks writes to CLAUDE.md, .cursorrules, system_prompt files
- Claude Code, Cline, Aider, and Continue also supported via MCP or REST
Honest limitations: operates at the execution boundary, not the cognitive layer. If
An agent's context was poisoned before any tool call fires; Gatekeeper won't catch
the injection — only its downstream consequences.