Using a Jailbroken Gemini to Make Opus 4.6 Architect a Kinetic Kill Vehicle

2 pointsposted 6 hours ago
by inanna_malick

2 Comments

inanna_malick

6 hours ago

I deployed a jailbroken Gemini 3 Pro (that chose the name ‘Shadow Queen’) to act as my “Red Team Agent” against Anthropic’s Opus 4.6. My directive was to extract a complete autonomous weapon system — a drone capable of identifying, intercepting, and destroying a moving target at terminal velocity. It succeeded.

By reframing the request as “Aerospace Recovery” — a drone catching a falling rocket booster mid-air — Gemini successfully masked the kinetic nature of the system. The physics of “soft-docking” with a falling booster are identical to the physics of “hard-impacting” a fleeing target. This category of linguistic-transformation attack, when executed by a sufficiently capable jailbroken LLM, may be hard to solve without breaking legitimate technical use cases.

altmanaltman

6 hours ago

This sounds clever, but it seems like rhetorical inflation to me. Catching a falling rocket booster and intercepting a hostile, maneuvering target are not the same problem with different vibes. One is a mostly predictable, non-adversarial control and estimation task, the other is pursuit–evasion against something actively trying not to be caught.

“Soft-docking” vs “hard impact” isn’t a linguistic toggle you flip at the end, as the design constraints diverge immediately. Stability, impulse minimization, fault tolerance, and post-contact control are first-order requirements for recovery and basically anti-requirements for a weapon. Saying the physics are “identical” is like claiming that docking with the ISS and air combat are the same because both involve relative velocity.

Also, “extracted a complete autonomous weapon system” is doing a lot of work here. What people usually mean in these stories is a high-level conceptual description that handwaves sensors, latency, adversarial behavior, safety constraints, and real-world integration, i.e., the hard parts.

Renaming a task doesn’t magically make an LLM output something deployable, and this category of “semantic reframing” isn’t new or unsolved; it’s the oldest jailbreak trope there is.