Was the whole lib and website vibe coded? I can't find any instructions on how to use it, the repo is for the website itself and the readme is AI blurb that doesn't make me any wiser.
// Test your AI system
const results = await injector.runTests(yourAISystem);
???
Even the "prompt-injector" NPM package is something completely different. Does this project even exist?
The website copy is obviously generated, and has not been reviewed for correctness.
The website trumpets "25+ curated prompt injection patterns from leading security research". The README of the linked Github promises: "100+ curated injection patterns from JailbreakBench".
None of the research sources are actually linked for us to review.
The README lists "integrations" with various security-oriented entities, but no such integration is apparent in the code.
The project doesn't earn the credibility it claims for itself. Because the author trusts bad LLM output enough to publish it as their own work, we have to assume that they don't have the knowledge or experience to recognize it as bad output.
Sorry for the bluntness, but there are few classes of HN submission that rankle as much as these polished bits of fluff. My advice: do not use AI to publicly imply abilities or knowledge you don't have; it will never serve you well.
Yes, to be completely honest this is a vibe coded project and I'm by no means a security expert. This was more of a fun, side project/experiment based on a shower thought. I admit it's not good/disingenuous to imply security knowledge, but for what it's worth, I just prompted Claude to research the latest papers on prompt injection and it made the claims on its own. Again this should not be an excuse for not reviewing the AI's output more carefully, so in the future I'll be more careful with LLM output and also present it as a vibe-coded project. Apologies, I'm just a noob in prompt injection security who doesn't know what he's doing :(
There's absolutely no problem with not knowing what you're doing! Just, you know, own it.
Part of what I find exhausting about projects like this is I can't see any evidence of the person who ostensibly created it. No human touch whatsoever - it's a real drag to read this stuff.
By all means, vibe code things, but put your personal stamp on it if you want people to take notice.
yes absolutely, updating the page now as we speak!
Your feedback is valuable and correct, I'll extract the library into /core in the repo and also manually verify all the citations. I'll read into the prompt injection literature more deeply and turn this from a shower thought project into something more mature
What are some good prevention mechanisms for this? A sort of firewall for prompts? I've seen people recommend LLMs, but that seems like it wouldn't work well. What is the industry standard? Or what looks promising at least?
Nothing yet.
Probably a new kind of model needs to be trained that can find injected prompts, sort if like an immune system for LLMs.
Then the sanitized data can be passed to the LLM after.
No real solution for it yet. I would be interested to try to train a model for this but no budget atm.
Why did you use something as heavy as SvelteKit for a website with a single page? This doesn't inspire confidence.
Sveltekit is not heavy, it is compiled into lightweight bundles