aktau
3 months ago
I have a bunch, but one that I rarely see mentioned but use all the time is memo(1) (https://github.com/aktau/dotfiles/blob/master/bin/memo).
It memoizes the command passed to it.
$ memo curl https://some-expensive.com/api/call | jq . | awk '...'
Manually clearing it (for example if I know the underlying data has changed: $ memo -c curl https://some-expensive.com/api/call
In-pipeline memoization (includes the input in the hash of the lookup): $ cat input.txt | memo -s expensive-processor | awk '...'
This allows me to rapidly iterate on shell pipelines. The main goal is to minimize my development latency, but it also has positive effects on dependencies (avoiding redundant RPC calls). The classic way of doing this is storing something in temporary files: $ curl https://some-expensive.com/api/call > tmpfile
$ cat tmpfile | jq . | awk '...'
But I find this awkward, and makes it harder than necessary to experiment with the expensive command itself. $ memo curl https://some-expensive.com/api/call | jq . | awk '...'
$ memo curl --data "param1=value1" https://some-expennsive.com/api/call | jq . | awk '...'
Both of those will run curl once.NOTE: Currently environment variables are not taken into account when hashing.
aabdelhafez
3 months ago
You're gonna absolutely love up (https://github.com/akavel/up).
If you pipe curl's output to it, you'll get a live playground where you can finesse the rest of your pipeline.
$ curl https://some-expensive.com/api/call | upaktau
3 months ago
up(1) looks really cool, I think I'll add it to my toolbox.
It looks like up(1) and memo(1) have similar use cases (or goals). I'll give it a try to see if I can appreciate its ergonomics. I suspect memo(1) will remain my mainstay:
1. After executing a pipeline, I like to press the up arrow (heh) and edit. Surprisingly often I need to edit something that's *not* the last part, but somewhere in the middle. I find this cumbersome in default line editing mode, so I will often drop into my editor (^X^E) to edit the command.
2. Up seems to create a shell command after completion. Avoiding the creation of extra files was one of my goals for memo(1). I'm sure some smart zsh/bash integration could be made that just returns the completed command after completing.aktau
3 months ago
Another thing I built into memo(1) which I forgot to mention: automatic compression. memo(1) will use available (de)compressors (in order of preference: zstd, lz4, xz, gzip) to (de)compress stored contents. It's surprising how much disk space and IOPS can be saved this way due to redundancy.
I currently only have two memoized commands:
$ for f in /tmp/memo/aktau/* ; do
ls -lh "$f" =(zstd -d < $f)
done
-rw-r----- 1 aktau aktau 33K /tmp/memo/aktau/0742a9d8a34c37c0b5659f7a876833b6dad9ec689f8f5c6065d05f8a27d993c7bbcbfdc3a7337c3dba17886d6f6002e95a434e4629.zst
-rw------- 1 aktau aktau 335K /tmp/zshSQRwR9
-rw-r----- 1 aktau aktau 827 /tmp/memo/aktau/8373b3af893222f928447acd410779182882087c6f4e7a19605f5308174f523f8b3feecbc14e1295447f45b49d3f06da5da7e8d7a6.zst
-rw------- 1 aktau aktau 7.4K /tmp/zshlpMMdo
That's roughly 10x compression ratio.dotancohen
3 months ago
This is terrific! I curl to files and then pipe them, all the time. This will be a great help.
I wonder if we have gotten to the point where we can feed an LLM our bash history and it could suggest improvements to our workflow.
edanm
3 months ago
Interesting idea. And pretty easy to try.
If you do it, I'd love to hear your results.
In general, I wonder if we're at the point where an LLM watching you interact with your computer for twenty minutes can improve your workflow, suggest tools, etc. I imagine so, because when I think to ask how to do something, I often get an answer that is very useful, so I've automated/fixed far more things than in the past.
1vuio0pswjnm7
3 months ago
.
#!/usr/bin/env bash
#
# memo(1), memoizes the output of your command-line, so you can do:
#
# $ memo <some long running command> | ...
#
# Instead of
#
# $ <some long running command> > tmpfile
# $ cat tmpfile | ...
# $ rm tmpfile
to save output, sed can be used in the pipeline instead of tee
for example,
x=$(mktemp -u);
test -p $x||mkfifo $x;
zstd -19 < $x > tmpfile.zst &
<long running command>|sed w$x|<rest of pipeline>;
# You can even use it in the middle of a pipe if you know that the input is not
# extremely long. Just supply the -s switch:
#
# $ cat sitelist | memo -s parallel curl | grep "server:"
grep can be replaced with sed and search results sent to stderr
< sitelist curl ...|sed '/server:/w/dev/stderr'|zstd -19 >tmpfile.zst;
or send search results to stderr and to some other file
sed can save output to multiple files at a time
< sitelist curl ...|sed -e '/server:/w/dev/stderr' -e "/server:/wresults.txt"|zstd -19 >tmpfile.zst;aktau
3 months ago
Those commands are a (1) harder to grok and (2) do not actually use the memoized result (tmpfile.zst) to speed up a subsequent run.
Can you give a more complete example of how you would use this to speed up developing a pipeline?
1vuio0pswjnm7
3 months ago
If provide sample showing (a) input format of text and (b) desired output format of text, then perhaps can provide an example of how to do the text processing
user
3 months ago
gavinray
3 months ago
15 years of Linux and I learn something new all the time...
mlrtime
3 months ago
Its why I keep coming back, now how do I remember to use this and not go back to using tmpfiles :)
divan
3 months ago
I use Warp terminal for couple of years, and recently they embeeded AI into it. At first I was irritated, disabled it, but AI Agent is built in as an optional mode (Cmd-I to toggle). And I found myself using it more and more often for commands that I have no capacity or will to remember or dig through the man pages (from "figure out my IP address on wifi interface" to "make ffmpeg do this or that"). It's fast and can iterate over own errors, and now I can't resist using it regularly. Removes the need for "tools to memorize commands" entirely.
news_hacker
3 months ago
I've been using bkt (https://github.com/dimo414/bkt) for subprocess caching. It has some nice features, like providing a ttl for cache expiration. In-pipeline memoization looks nice, I'm not sure it supports that
aktau
3 months ago
I was not aware of bkt. Thanks for the link. It seems very similar to memo, and has more features:
- Explicit TTL
- Ability to include working directory et al. as context for the cache key.
There do appear to be downsides (from my PoV) as well: - It's a rust program, so it needs to be compiled (memo is a bash/zsh script and runs as-is).
- There's no mention of transparent compression, either in the README or through simple source code search. I did find https://github.com/dimo414/bkt/issues/62 which mentions swappable backends. The fact that it uses some type of database instead of just the filesystem is not a positive for me, I prefer the state to be easy to introspect with common tools. I will often memo commands that output gigabytes of data, which is usually highly compressible. Transparent compression fixes that up. One could argue this could be avoided with a filesystem-level feature, like ZFS transparent compression. But I don't know how to detect that in a cross-FS fashion.
I opened https://github.com/dimo414/bkt/discussions/63 so the author of bkt can perhaps also participate.Perepiska
3 months ago
Caching some API call because it is expensive and use cached data many months later because of bash suggestion :(
aktau
3 months ago
The default storage location for memo(1) output is /tmp/memo/${USER}. Most distributions either have some automatic periodic cleanup, and/or wipe it on restart.
Separately from that:
- The invocation contains *memo* right in there, so you (the user) knows that it might memoize.
- One uses memo(1) for commands that are generally slow. Rerunning your command that has a slow part and having it return in a millisecond while you weren't expecting it should make the spider-sense tingle.
In practice, this has never been a problem for me, and I've used this hacked together command for years.naikrovek
3 months ago
i see no way to name the memo in your examples, so how do you refer to them later?
also, this seems a lot like an automated way to write shell scripts that you can pipe to and from. so why not use a shell script that won't surprise anyone instead of this, which might?
aktau
3 months ago
The name of the memo is the command that comes after it:
$ memo my-complex-command --some-flag my-positional-arg-1
In this invocation, a hash (sha512) is taken of "my-complex-command --some-flag my-positional-arg-1", which is then stored in /tmp/memo/${USER}/{sha512hash}.zst (if you've got zstd installed, other compression extensions otherwise).sgarland
3 months ago
Dude, this is _awesome_. Thank you for sharing!
aktau
3 months ago
Glad you like it. Hope you get as much use of it as me.
cryptonector
3 months ago
> `curl ... | jq . | awk '...'`
Uhm, jq _is_ as powerful (more) as awk. You can use jq directly and skip awk.
(I know, old habits die hard, and learning functional programming languages is not easy.)
aktau
3 months ago
Yes, I know. I should've taken a different example. But it's also realistic in a way. When I'm doing one-offs, I will sometimes take shortcuts like this. I know awk fairly well, and I know enough of jq that I know invoking jq . pretty prints the inbound json on multiple lines. While I know I could create a proper jq expression, the combo will get me there quicker. Similarly I'll sometimes do:
$ awk '...' | grep | ...
Because I'm too lazy to go back to the start of the awk invocation and add a match condition there. If I'm going to save it to a script, I'll clean it up. (And for jq, I gotta be honest that my starting point these days would probably be to show my contraption to an LLM and use its answer as a starting point, I don't use jq nearly enough to learn its language by memory.)