bradleybuda
5 hours ago
I think the "lethal trifecta" framing is useful and glad that attempts are being made at this! But there are two big, hard-to-solve problems here:
1. The "lethal trifecta" is also the "productive trifecta" - people want to be able to use LLMs to operate in this space since that's where much of the value is; using private / proprietary data to interact with (do I/O with) the real world.
2. I worry that there will soon be (if not already) a fourth leg to the stool - latent malicious training within the LLMs themselves. I know the AI labs are working on this, but trying to ferret out Manchurian Candidates embedded within LLMs may very well be the greatest security challenge of the next few decades.
76SlashDolphin
5 hours ago
Those are really good points and we do have some plans for them, mainly on the first topic. What we're envisioning in terms of UX for our gateway is that when you set it up it's very defensive but whenever it detects a trifecta, you can mark it as a false positive. Over time the gateway will be trained to be exactly as permissive as the user wishes with only the rare false positive. You can already do that with the gateway today (you get a web notification when the gateway detects a trifecta and if you click into it, you get taken to a menu to approve/deny it if it occurs in the future). Granted, this can make the gateway overly-permissive but we do have plans on how to improve the granularity of these rules.
Regarding the second point, that is a very interesting topic that we haven't thought about. It would seem that our approach would work for this usecase too, though. Currently, we're defending against the LLM being gullible but gullible and actively malicious are not properties that are too different. It's definitely a topic on our radar now, thanks for bringing it up!