pacjam
2 months ago
Thanks for sharing!! (Charles here from Letta) The original MemGPT (the starting point for Letta) was actually an agent CLI as well, so it's fun to see everything come full circle.
If you're a Claude Code user (I assume much of HN is) some context on Letta Code: it's a fully open source coding harness (#1 model-agnostic OSS on Terminal-Bench, #4 overall).
It's specifically designed to be "memory-first" - the idea is that you use the same coding agents perpetually, and have them build learned context (memory) about you / your codebase / your org over time. There are some built-in memory tools like `/init` and `/remember` to help guide this along (if your agent does something stupid, you can 'whack it' with /remember). There's also a `/clear` command, which resets the message buffer, but keeps the learned context / memory inside the context window.
We built this for ourselves - Letta Code co-authors the majority of PRs on the letta-code GitHub repo. I personally have been the same agent for ~2+ weeks (since the latest stable build) and it's fun to see its memory become more and more valuable over time.
LMK if you have any q's! The entire thing is OSS and designed to be super hackable, and can run completely locally when combined with the Letta docker image.
shortlived
2 months ago
I'm very interested in trying this out! I run Claude Code in sandbox with `--dangerously-skip-permissions`. Is that possible with Letta?
pacjam
2 months ago
Yes! Letta Code also has a "danger" mode, it's `--yolo`. If you're running Claude Code in a sandbox in headless mode, Letta Code has that too, just do something like `letta -p "Do something dangerous (it's just a sandbox, after all)" --yolo`
More on permissions here: https://docs.letta.com/letta-code/permissions
Install is just `npm install -g @letta-ai/letta-code`
bazhand
2 months ago
How does the memory scale (or not!) over time. If using Letta in a single agent mode, similar to just using claude - how does memory blocks stay relevant and contextual?
I guess what I'm asking is, if there is a memory block limit, is that an issue for self learning over time. Claude as you know just straight up ignores CLAUDE.md and doesnt self-improve it.
koakuma-chan
2 months ago
Why can't I see Cursor on tbench? Is it that bad that it's not even on the leaderboard? I am trying to figure out if I can pitch your product to my company, and whether it is worth it.
pacjam
2 months ago
Not sure why Cursor CLI isn't on the leaderboard... I'm guessing it's because Cursor is focused primarily on their IDE agent, not their CLI agent, and Terminal-Bench is an eval/benchmark for CLI agents exclusively.
If you're asking about why Letta Code isn't on the leaderboard, the TBench maintainers said it should be up later today (so probably refresh in a few hours!). The results are already public, you can see them on our blog (graphs linked in the OP). They are also verifiable, all data is available for the runs + Letta Code is open source, so you can replicate the results yourself.
koakuma-chan
2 months ago
I mean, I understand that this is a terminal benchmark, but the point is to benchmark LLM harnesses, and whether the output is printed to the terminal, or displayed in the UI shouldn't matter. Are there alternative benchmarks where I can see how Letta Code performs compared to cursor?
pacjam
2 months ago
Ah gotcha! In that case, I think Terminal-Bench is currently the best proxy for "how good is this harness+agent combo at coding (quantitatively)" question. I think it used to be SWE-Bench, but I think T-Bench is a better proxy for this now. Like you said though, unfortunately Cursor isn't listed (probably their choice to not list it, maybe because it doesn't place highly).
koakuma-chan
2 months ago
Alright, I will try out Letta Code manually later then.
pacjam
2 months ago
Cool, let us know what you think! Would recommend trying w/ Sonnet/Opus 4.5 or GPT-5.2 (those are the daily drivers we use internally w/ Letta Code)