snowmobile
17 days ago
Wait, so you don't trust the AI to execute code (shell commands) on your own computer, so therefore need a safety guardrail, in order to facilitate it writing code that you'll execute on your customers' computers (the financial analysis tool)?
And adding the fact that you used AI to write the supposed containment system, I'm really not seeing the safety benefits here.
The docs also seem very AI-generated (see below). What part did you yourself play in actually putting this together? How can you be sure that filtering a few specific (listed) commands will actually give any sort of safety guarantees?
https://github.com/borenstein/yolo-cage/blob/main/docs/archi...
borenstein
17 days ago
You are correct both that the AI wrote 100% of the code (and 90% of the raw text). You are also correct that I want a safety guardrail for the process by which I build software that I believe to be safe and reliable. Let's take a look at each of these, because they're issues that I also wrestled with throughout 2025.
What's my role here? Over the past year, it's become clear to me that there are really two distinct activities to the business of software development. The first is the articulation of a process by which an intent gets actualized into an automation. The second is the translation of that intent into instructions that a machine can follow. I'm pretty sure only the first one is actually engineering. The second is, in some sense, mechanical. It reminds me of the relationship between an architect and a draftsperson.
I have been much freer to think about engineering and objectives since handing off the coding to the machine. There was an Ars Technica article on this the other day that really nails the way I've been experiencing this: https://arstechnica.com/information-technology/2026/01/10-th...
Why do I trust the finished product if I don't trust the environment? This one feels a little more straightforward: it's for the same reason that construction workers wear hard hats in environments that will eventually be safe for children. The process of building things involves dangerous tools and exposed surfaces. I need the guardrails while I'm building, even though I'm confident in what I've built.
visarga
17 days ago
> it's for the same reason that construction workers wear hard hats in environments that will eventually be safe for children.
Good response, but more practically, when you are developing a project you allow the agent to do many things on that VM, but when you deliver code it has to actually pass tests. The agent working is not being tested live, but the code delivered is tested before use. I think tests are the core of the new agent engineering skill - if you can have good tests you automated your human in the loop work to a large degree. You can only trust a code up to the level of its testing. LGTM is just vibes.
eikenberry
17 days ago
> The first is the articulation of a process by which an intent gets actualized into an automation. The second is the translation of that intent into instructions that a machine can follow.
IMO this pattern fails on non-trivial problems because you don't know how the intent can be actualized into automation without doing a lot of the mechanical conversion first to figure out how the intent maps to the automation. This mapping is the engineering. If you can map the intent to actualization without doing it, then this is a solved problem in engineering and be usable by non-engineers. Relating this to your simile, it is more like a developer vs. an architect where the developer uses pre-designed building vs. the architect which needs to design a new building to meet a certain set of design requirements.
hmokiguess
17 days ago
Thank you so much for this analogy. This reminded me how I’ve always bike without a helmet, even though I’ve been in crashes and hits before, it just isn’t in my nature to worry about safety in the same way others do I guess? People do be different and it’s all about your relationship with managing and tolerating risk.
(I am not saying one way is better than the other, it’s just different modes of engaging with risk. I obviously understand that having a helmet can and would save my life should an accident occur. The keyword here is “should/would/can” which some people believe in “shall/will/does” and prefer to live this way. Call it different faith or belief systems I guess)
snowmobile
17 days ago
[flagged]
borenstein
17 days ago
This was 100% not AI generated! Honestly, though, I've been talking to AI chatbots so much in the last year that I'm sure their style has rubbed off on me. At some point, I did a little math and determined that I had probably exchanged an order of magnitude more words back and forth with AIs than I will with my spouse over the course of our entire lives.
asragab
17 days ago
At least we can be confident your comments aren't ai generated.
KurSix
17 days ago
The logic is Defense in Depth. Even if the "cage" code is AI-written and imperfect, it still creates a barrier. The probability of AI accidentally writing malicious code is high. The probability of it accidentally writing code that bypasses the imperfect protection it wrote itself is much lower
snowmobile
17 days ago
Defense in depth doesn't mean throwing a die twice and hoping you don't get snake eyes. The AI-generated docs claim that the AI-generated code only filters specific actions, so even if it manages to do that correctly it's not a lot of protection.
solumunus
16 days ago
> The probability of AI accidentally writing malicious code is high.
Is it though? We’ve seen a lot of output at this point and it does not strike me as high…
KurSix
8 days ago
I should clarify, not "malicious" in the sense of "wants to hack you", but "dangerous" by nature. AI loves to hallucinate non-existent packages (hello, supply chain attacks), hardcode credentials, or disable SSL verification simply because it makes the code work. It's not evil, it's just competently ignorant, which in a security context is often worse than an overt enemy
yaront111
15 days ago
i built deterministic 100% solution cordum.io
asragab
17 days ago
[flagged]
snowmobile
17 days ago
You seem upset. I'm simply saying that if I didn't trust a human developer to run shell commands on the webserver (or the much lower bar of my own laptop), I woudn't trust them to push code that's supposed to run on that webserver, even after "auditing" the code. Would you let an agent run freely ssh:d into your webserver?
IanCal
17 days ago
I would absolutely put ssh access to the prod server way above submitting a pr for danger, that’s an enormous step up in permissions.
borenstein
17 days ago
I'm with you here! The idea with yolo-cage is that the worst the LLM can realistically do is open an awful PR and waste your time. (Which, trust me, it will.) Claude suggested the phrase: "Agent proposes, human disposes."
snowmobile
17 days ago
I'm not saying you should allow all your devs access to the prod server in practice (security in layers and all that). I'm saying, if you wouldn't trust a person to be competent and aligned enough with your goals to have that access in principle, why would you trust them to write code for you? Code that's going to run on that very same server you're so protective about. Sure you may scrutinize every line they write in detail, but then what's the point of hiring them?
IanCal
16 days ago
Because it’s way easier to completely fuck up a system with running arbitrary commands on it while in use than it is by changing your code. It’s a massive step up in power and a massive drop in how much you can scrutinise a change (to zero).
Maybe the llm can carefully craft an exploit that happens when nginx reads some HTML. Maybe it found a way of hiding file system access in an import I didn’t notice.
I can completely destroy a prod service by accidentally not escaping a space in an rm command.
I’m genuinely confused by this question unless you’ve never worked on production systems in a team before. In which case that’s fine and it’s good to learn but there’s going to be a lot of material out there about deploying and safety.
asragab
17 days ago
You seem inexperienced, lots of orgs do not allow their devs to arbitrarily ssh into their webservers without requesting elevation, which is fundamentally the difference between autonomous agent development `dangerously-skipping-permissions` and it asking every time to use commands? Which is the point of a sandbox?