Show HN: Vomitorium – all of your project in 1 text file

79 pointsposted 5 days ago
by jwally

55 Comments

ghgr

4 days ago

As an alternative to (npm -g)'ing here some potentially useful coreutils one-liners I've been using for a similar purpose:

- Dump all .py files into out.txt (for copy/paste into a LLM)

> find . -name "*.py" -exec cat {} + > out.txt

- Sort all .py files by number of lines

> find . -name '*.py' -exec wc -l {} + | sort -n

Diti

3 days ago

I was going to comment this. “What’s wrong with `cat`”, whose job is literally to concatenate files? Or even [uncompressed] `tar` archives, which are basically just a list of files with some headers?

scbrg

3 days ago

Never underestimate the node community's willingness to ignore the existing tech stack and reinvent 50 year old tools. It's peak NIH.

c-fe

4 days ago

Love this. I created (half-jokingly, but only half) the concept of a monofile (inspired by our monorepo) in our team. I have not managed to convince my colleagues to switch yet, but maybe this package can help. Unironically, I find that in larger python projects, combining various related sub 100 loc files into one big sub 1000 loc file can do magic to circular import errors and remove 100s of lines of import statements.

SuchAnonMuchWow

3 days ago

To help with circular import, we switched a few years ago to lazily importing submodules on demand, and never switched back.

Just add to your __init__.py files:

import importlib

def __getattr__(submodule_name):

    return importlib.import_module('.' + submodule_name, __package__)
And then just import the root module and use it without ever needing to import individual submodules:

import foo

def bar():

    return foo.subfoo.bar() # foo.subfoo is imported when the function is first executed instead of when it is parsed, so no circular import happen

hobs

3 days ago

Doesn't that mean your editor support is crap though?

SuchAnonMuchWow

3 days ago

Not at all. Sublime is perfectly fine with it.

I suspect that from the usage in the code, it knows that there is a module foo and a submodule subfoo with a function bar() in it, and it can look directly in the file for the definition of bar().

It would be another story if we used this opportunity to mangle the submodules names for example, but that the kind of hidden control flow that nobody want in his codebase.

Also, it is not some dark arts of import or something: it is pretty standard at this point since its one of the most sane way of breaking circular dependencies between your modules, and the feature of overloading a module __getattr__ was introduced specifically for this usecase. (I couldn't find the specific PEP that introduced it, sorry)

aunderscored

3 days ago

It does, which is why this is more easily done by importing exact bits or using a single file

sureglymop

4 days ago

I usually do this with docker/podman compose files for dev environments.

I see people creating all kinds of mounts and volumes but I just embed files inline under the configs top level key. I even embed shell scripts that way to do one shot/initialization tasks.

The goal is to just have one compose.yml file that the developer can spin up for a local development reproduction of what they need. It's quite nice.

phito

4 days ago

I like splitting large compose file in smaller units and including them in the main compose file

Muromec

3 days ago

I once had a 4k line javascript file (a vuex module), which I navigated using / in vim, which came with another 20k likes of tests (also in the single file). I would say 5k lines is the real celling.

__MatrixMan__

3 days ago

I've been dreaming of a tool which resembles this, at least in spirit.

I want to figure out how to structure a codebase such that a failing test can spit out a CID for that failure such that it can be remotely recreated (you'd have to be running ipfs so that the remote party can pull the content from you, or maybe you push it to some kind of hub before you share it).

It would be the files relevant to that failure--both code files and data files, stdin, env vars... a reproducible build of a test result.

It would be handy for reporting bugs or getting LLM help. The remote party could respond with a similar "try this" hash which the tooling would then understand how to apply (fetching the necessary bits from their machine, or the hub). Sort of like how Unison resolves functions by cryptographic hash, except this is a link to a function call, so it's got inputs and outputs too.

Of course that's a long way from vomiting everything into a text file, I need to establish functional dependency at as small a granularity as possible, but this feels like the first step on a path that eventually gets us there.

ffsm8

3 days ago

Hmm, you could probably make a proof of concept on a weekend specifically in the typescript/JavaScript ecosystem, as it's already heavily reliant on bundlers.

The process could be

1. defining a new/temporary bundler entry point

2. copying the failing code into the file

3. Bundle without minification

It'd probably be best to reduce scope by limiting it to a specific testing framework and make it via an extension, i.e. jest

__MatrixMan__

3 days ago

You're talking sense, but I'm kinda wanting to do it at the subprocess level so that caller and callee need not use the same language (I was talking in terms of tests but tests are just a special kind of function).

Whether to use nodejs or python or rust (and which version thereof) will be as much a part of the bundled function as its code. I figure I'll wrap nix so it can replicate the environments, then I'll just have to do the runtime stuff.

scioto

3 days ago

It'd be nice if something similar were available to traverse, say, directories of writings in Markdown, Word, LibreOffice, etc., and output a single text file so I have all my writings in one place. Plus allow plug-ins to extract from more exotic file types not originally included.

vdm

3 days ago

    shopt -s globstar
    tail -n+1 **/*.py | pbcopy

Charon77

3 days ago

Isn't this a tar file?

Cyberdog

3 days ago

That's what I was thinking too. It looks like someone just reinvented tar, and given how it's a JavaScript thing I'm wondering if it's a zoomer who didn't know tar existed and the HN crowd would set them straight. But then I come into the comments here and people are posting about how absolutely brilliant it is, so surely I'm missing something… right?

theviolacode

2 days ago

Cool! I'd like to see an indication of the total number of tokens in the output, so I know right away on which LLM I can use this prompt or, if it's too large, I can relaunch the script by excluding other files to reduce the number of tokens in the output

mosselman

3 days ago

I can imagine the token counts to be off the charts. How would an llm handle this input? Llm output quality already drops quite hard at a out 3000 tokens let alone 128k

jonathaneunice

3 days ago

Depends on the LLM, perhaps, and/or the problem being solved. I get very good output from 10K–25K token submissions to Anthropic's Claude API.

mp5

2 days ago

One feature you could add is allowing the user to map changes in the concatenated file back to the original files. For example, if an LLM edits the concatenated file, I would want it to return the corresponding filenames and line numbers of the original files.

wilsonzlin

4 days ago

acoretchi

4 days ago

Repopack with Claude projects has been a game changer for me on repository-wide refactors.

wasyl

3 days ago

Seems like repopack only packs the repo. How do you apply the refactors back to the project? Is it something that Claude projects does automatically somehow?

samrolken

3 days ago

I have a bash script which is very similar to this, except instead of dumping it all into one file, it opens all the matched files as tabs in Zed. Since Zed's AI features let you dump all, or a subset, of open tabs into context, this works great. It gives me a chance to curate the context a little more. And what I'm working on is probably already in an open tab anyway.

breck

5 days ago

This made me laugh. Thanks!

Can you go 1 more step? Is there a way to not just dump someone's project into a plain text file, but sometime intelligently craft it into a ready to go prompt? I could use that!

Here's my user test: https://www.youtube.com/watch?v=sTPTJ4ladiI

Terretta

4 days ago

> Can you go 1 more step? Is there a way to not just dump someone's project into a plain text file, but sometime intelligently craft it into a ready to go prompt? I could use that!

https://aider.chat/

It does this, and smartly, using tree-sitter, for quite a few tree-sitter supported languages.

breck

4 days ago

Looks very interesting! Thanks for the link!

leovailati

3 days ago

We use a C compiler for embedded systems that doesn't support link time optimizations (unless you pay for the pro version, that is). I have been thinking about some tool like this that merges all C source files for compilation.

professoretc

3 days ago

That's called a "unity" build, isn't it? I was under the impression that it was a relatively well-known technique, such that there are existing tools to merge a set of source files into a single .c file.

rramadass

3 days ago

Unless i am understanding you wrong, you could easily do this by #including all your a.c, b.c etc. into one file input.c and feeding that to the compiler.

We did this for a home-grown SoC with a gcc port for which there was no linker.

aetherspawn

3 days ago

Surely with storage being pretty slow and everything it would be better to compress it into an archive with really basic compression?

jonplackett

3 days ago

This is really helpful. I immediately thought I’d be useful for sending off to ChatGPT and then saw that’s what it’s actually for. Thank you!

gopi

3 days ago

Shouldn't this work?

find /path/to/directory -type f -exec cat {} + > output.txt

lynx23

3 days ago

vim-ai basically supports this use case out of the box. All you need is your a index file listing all the files you want included, starting with

>>> include

guidedlight

3 days ago

This is probably very useful for use with LLM’s.

istvanmeszaros

4 days ago

Love the name :D.

thih9

4 days ago

> A vomitorium is a passage situated below or behind a tier of seats in an amphitheatre or a stadium through which large crowds can exit rapidly at the end of an event.

> A commonly held but erroneous notion is that Ancient Romans designated spaces called vomitoria for the purpose of literal vomiting, as part of a binge-and-purge cycle

https://en.wikipedia.org/wiki/Vomitorium

lelandfe

3 days ago

In which the thing being spewed is people

ziofill

3 days ago

the .sick file extension is a nice touch ^^

donw

3 days ago

Be careful with the name, McDonald’s might sue you for copyright infringement.

frereubu

3 days ago

The name links up nicely with AI enshittification. Although if you wanted to be pedantic, for that metaphor to work you'd really want to call it "gorge" or something more related to ingestion rather than vomiting. (I'm aware that a vomitorium was the exit from a Roman stadium, so it's not really about throwing up either).