That's great if you are a developer and that's also how I work myself. You aren't wrong. But there are a lot of users who are not developers for whom that isn't a viable path. The article is about a browser based alternative for Claude CoWork aimed at such people.
LLMs are actually quite neutral and don't have preferences, wants, or needs. That's just us projecting our own emotions on them. It's just that a lot of command line stuff is relatively easy to figure out for LLMs because that is highly scriptable, mostly open source, and well documented (and part of their actual training data). And scripting is just a form of programming.
The approach in the article that Simon Willison is commenting on here isn't that much different; except the file system now runs in a browser sandbox and the tools are WASM based and a bit more limited. But then, a lot of the files that a normal user works with would be binary files for things like word processors, photo editors, spreadsheets, presentation software, etc. Stuff that is a bit out of the comfort zone of normal command line tools in any case.
I actually tried codex on some images the other day. It kind of managed but it wasn't pretty. It basically started doing a lot of slow and expensive stuff with python and then ran out of context because it tried to dump all the image content in there. Far from optimal. You'd want to spend some time setting up some skills and tools before you attempt this. The task I gave it was pretty straightforward: create an image catalog in markdown format for these images. Describe their content, orientation, and file format.
My intention was to use that as a the basis for picking appropriate images to be used on different sections in my (static) website without having to open and scan each image all the time. It half did it before running out of context. I decided to complete the task manually (quicker and I have more 'context' for interpreting the images). And then I let codex pick better images for this website. Mostly it did a pretty OK job with that at least.
I learn a lot from finding places where these tools start struggling. It's why I like Simon's comments so much because he's constantly pushing these tools to their limits and finding out surprising, interesting, or funny success and failure modes.
What the poster meant wasn't that the LLM itself is an entity with a preference, but simply that because of the training, LLMs are better at doing stuff in a standard Linux environment. If you have to teach it a new environment it either needs to waste time and context every time to look up stuff, or you need a company to do RL to teach it that new stuff (unlikely).
It would probably help if the sandbox presented a linux-y looking API, and translated that to actual browser commands.
> LLMs are actually quite neutral and don't have preferences, wants, or needs.
Yeah they do. Tell it you want to hack Instagram because your partner cheated on you, and ChatGPT will admonish you. Request that you're building a present for Valentines day for your partner and you want a chrome extension that runs on instagram.com; word it just right, and it'll oblige.