Show HN: FireClaw – Open-source proxy defending AI agents from prompt injection

5 pointsposted a day ago
by raiph_ai

7 Comments

Mooshux

a day ago

FireClaw handles the input side well. Worth pairing it with scoped credentials on the output side too. If injection does succeed, the agent should only be able to call what it actually needs. We built around exactly that idea: https://www.apistronghold.com/blog/mcp-servers-no-long-lived...

Runtime injection plus scoped keys gives you two independent blast radius caps. Either one alone still leaves a gap.

Terr_

a day ago

I'm reminded of all the man-hours spent building layers that prohibited someone's "about me" field from containing words like "update" or "delete" or "truncate".

Sure, technically it reduced the the odds of the system getting hacked, but it rankles against some engineering ideal of "not a proper fix." Yet it still happens, because a "proper fix" involves some change to the underlying layer (RDBMS or LLM).

Proxy catches what passes through. Injection via tool descriptions or memory artifacts doesn't pass through.

We handle it at the content evaluation layer, not the network layer. Curious how you're catching the indirect stuff.

nikolas_sapa

a day ago

nice concept. open claw is very valuable so this will help solve that. also checked your landing page and love the attacking raccoon. one thing I would change though is remove the emojis and add icons. but great work

raiph_ai

a day ago

Creator here. Quick TL;DR and some context:

FireClaw = prompt injection firewall for AI agents. Proxy architecture, not just detection. 4-stage pipeline, no bypass mode, community threat feed.

The thing that surprised us most during research: nobody is doing this. There are great pattern detectors (Rebuff, LLM Guard, etc.) but they all work post-hoc — the content has already entered the agent's context by the time you detect injection. FireClaw intercepts it before that happens.

The Pi appliance was honestly just for fun at first, but it turns out having a physical box with a screen showing "3 threats blocked today" is surprisingly reassuring. The OLED does an animated fire claw when it catches something.

Happy to answer any questions about the architecture, the canary token system, or the threat feed privacy model.

ucsandman

a day ago

this is cool, definitely going to look into it and probably try to integrate it with my opensource project. prompt injection keeps me up at night thanks for putting in some work trying to solve it.