hackernews client

Show HN: Vomitorium – all of your project in 1 text file

79 pointsposted 10 months ago

55 Comments

ghgr

10 months ago

As an alternative to (npm -g)'ing here some potentially useful coreutils one-liners I've been using for a similar purpose:

- Dump all .py files into out.txt (for copy/paste into a LLM)

> find . -name "*.py" -exec cat {} + > out.txt

- Sort all .py files by number of lines

> find . -name '*.py' -exec wc -l {} + | sort -n

Diti

10 months ago

I was going to comment this. “What’s wrong with `cat`”, whose job is literally to concatenate files? Or even [uncompressed] `tar` archives, which are basically just a list of files with some headers?

scbrg

10 months ago

Never underestimate the node community's willingness to ignore the existing tech stack and reinvent 50 year old tools. It's peak NIH.

Love this. I created (half-jokingly, but only half) the concept of a monofile (inspired by our monorepo) in our team. I have not managed to convince my colleagues to switch yet, but maybe this package can help. Unironically, I find that in larger python projects, combining various related sub 100 loc files into one big sub 1000 loc file can do magic to circular import errors and remove 100s of lines of import statements.

SuchAnonMuchWow

10 months ago

To help with circular import, we switched a few years ago to lazily importing submodules on demand, and never switched back.

Just add to your __init__.py files:

import importlib

def __getattr__(submodule_name):

    return importlib.import_module('.' + submodule_name, __package__)

And then just import the root module and use it without ever needing to import individual submodules:

import foo

def bar():

    return foo.subfoo.bar() # foo.subfoo is imported when the function is first executed instead of when it is parsed, so no circular import happen

hobs

10 months ago

Doesn't that mean your editor support is crap though?

SuchAnonMuchWow

10 months ago

Not at all. Sublime is perfectly fine with it.

I suspect that from the usage in the code, it knows that there is a module foo and a submodule subfoo with a function bar() in it, and it can look directly in the file for the definition of bar().

It would be another story if we used this opportunity to mangle the submodules names for example, but that the kind of hidden control flow that nobody want in his codebase.

Also, it is not some dark arts of import or something: it is pretty standard at this point since its one of the most sane way of breaking circular dependencies between your modules, and the feature of overloading a module __getattr__ was introduced specifically for this usecase. (I couldn't find the specific PEP that introduced it, sorry)

aunderscored

10 months ago

It does, which is why this is more easily done by importing exact bits or using a single file

sureglymop

10 months ago

I usually do this with docker/podman compose files for dev environments.

I see people creating all kinds of mounts and volumes but I just embed files inline under the configs top level key. I even embed shell scripts that way to do one shot/initialization tasks.

The goal is to just have one compose.yml file that the developer can spin up for a local development reproduction of what they need. It's quite nice.

phito

10 months ago

I like splitting large compose file in smaller units and including them in the main compose file

Muromec

10 months ago

I once had a 4k line javascript file (a vuex module), which I navigated using / in vim, which came with another 20k likes of tests (also in the single file). I would say 5k lines is the real celling.

MatrixMan

10 months ago

I've been dreaming of a tool which resembles this, at least in spirit.

I want to figure out how to structure a codebase such that a failing test can spit out a CID for that failure such that it can be remotely recreated (you'd have to be running ipfs so that the remote party can pull the content from you, or maybe you push it to some kind of hub before you share it).

It would be the files relevant to that failure--both code files and data files, stdin, env vars... a reproducible build of a test result.

It would be handy for reporting bugs or getting LLM help. The remote party could respond with a similar "try this" hash which the tooling would then understand how to apply (fetching the necessary bits from their machine, or the hub). Sort of like how Unison resolves functions by cryptographic hash, except this is a link to a function call, so it's got inputs and outputs too.

Of course that's a long way from vomiting everything into a text file, I need to establish functional dependency at as small a granularity as possible, but this feels like the first step on a path that eventually gets us there.

ffsm8

10 months ago

Hmm, you could probably make a proof of concept on a weekend specifically in the typescript/JavaScript ecosystem, as it's already heavily reliant on bundlers.

The process could be

1. defining a new/temporary bundler entry point

2. copying the failing code into the file

3. Bundle without minification

It'd probably be best to reduce scope by limiting it to a specific testing framework and make it via an extension, i.e. jest

MatrixMan

10 months ago

You're talking sense, but I'm kinda wanting to do it at the subprocess level so that caller and callee need not use the same language (I was talking in terms of tests but tests are just a special kind of function).

Whether to use nodejs or python or rust (and which version thereof) will be as much a part of the bundled function as its code. I figure I'll wrap nix so it can replicate the environments, then I'll just have to do the runtime stuff.

orlp

10 months ago

With fd: https://github.com/sharkdp/fd

    fd [filter options] -X cat

E.g. to combine all .js files into combined.js:

    fd -e js -X cat > combined.js

scioto

10 months ago

It'd be nice if something similar were available to traverse, say, directories of writings in Markdown, Word, LibreOffice, etc., and output a single text file so I have all my writings in one place. Plus allow plug-ins to extract from more exotic file types not originally included.

BlindEyeHalo

10 months ago

seems fairly trivial to chain together something with find, pandoc (https://pandoc.org/MANUAL.html) and cat.

vdm

10 months ago

    shopt -s globstar
    tail -n+1 **/*.py | pbcopy

Charon77

10 months ago

Isn't this a tar file?

Cyberdog

10 months ago

That's what I was thinking too. It looks like someone just reinvented tar, and given how it's a JavaScript thing I'm wondering if it's a zoomer who didn't know tar existed and the HN crowd would set them straight. But then I come into the comments here and people are posting about how absolutely brilliant it is, so surely I'm missing something… right?

ableal

10 months ago

> someone just reinvented tar

Or "shell archives", .shar files - https://en.wikipedia.org/wiki/Shar - they used to be kicked around in comp.sources.

mosselman

10 months ago

I can imagine the token counts to be off the charts. How would an llm handle this input? Llm output quality already drops quite hard at a out 3000 tokens let alone 128k

jonathaneunice

10 months ago

Depends on the LLM, perhaps, and/or the problem being solved. I get very good output from 10K–25K token submissions to Anthropic's Claude API.

rectalogic

10 months ago

wilsonzlin

10 months ago

Similar project: https://github.com/yamadashy/repopack

acoretchi

10 months ago

Repopack with Claude projects has been a game changer for me on repository-wide refactors.

wasyl

10 months ago

Seems like repopack only packs the repo. How do you apply the refactors back to the project? Is it something that Claude projects does automatically somehow?

krudnicki

10 months ago

for me too

samrolken

10 months ago

I have a bash script which is very similar to this, except instead of dumping it all into one file, it opens all the matched files as tabs in Zed. Since Zed's AI features let you dump all, or a subset, of open tabs into context, this works great. It gives me a chance to curate the context a little more. And what I'm working on is probably already in an open tab anyway.

breck

10 months ago

This made me laugh. Thanks!

Can you go 1 more step? Is there a way to not just dump someone's project into a plain text file, but sometime intelligently craft it into a ready to go prompt? I could use that!

Here's my user test: https://www.youtube.com/watch?v=sTPTJ4ladiI

Terretta

10 months ago

> Can you go 1 more step? Is there a way to not just dump someone's project into a plain text file, but sometime intelligently craft it into a ready to go prompt? I could use that!

https://aider.chat/

It does this, and smartly, using tree-sitter, for quite a few tree-sitter supported languages.

breck

10 months ago

Looks very interesting! Thanks for the link!

theviolacode

10 months ago

Cool! I'd like to see an indication of the total number of tokens in the output, so I know right away on which LLM I can use this prompt or, if it's too large, I can relaunch the script by excluding other files to reduce the number of tokens in the output

mp5

10 months ago

One feature you could add is allowing the user to map changes in the concatenated file back to the original files. For example, if an LLM edits the concatenated file, I would want it to return the corresponding filenames and line numbers of the original files.

turblety

10 months ago

Really nice! I made a small cli tool that has an extra step of basically printing out a tree, so you can ask the ai what files you want to output:

https://github.com/markwylde/ai-toolkit

locallost

10 months ago

Why do we need modules at all? [1]

[1] https://erlang.org/pipermail/erlang-questions/2011-May/05876...

leovailati

10 months ago

We use a C compiler for embedded systems that doesn't support link time optimizations (unless you pay for the pro version, that is). I have been thinking about some tool like this that merges all C source files for compilation.

professoretc

10 months ago

That's called a "unity" build, isn't it? I was under the impression that it was a relatively well-known technique, such that there are existing tools to merge a set of source files into a single .c file.

rramadass

10 months ago

Unless i am understanding you wrong, you could easily do this by #including all your a.c, b.c etc. into one file input.c and feeding that to the compiler.

We did this for a home-grown SoC with a gcc port for which there was no linker.

jonplackett

10 months ago

This is really helpful. I immediately thought I’d be useful for sending off to ChatGPT and then saw that’s what it’s actually for. Thank you!

aetherspawn

10 months ago

Surely with storage being pretty slow and everything it would be better to compress it into an archive with really basic compression?

gopi

10 months ago

Shouldn't this work?

find /path/to/directory -type f -exec cat {} + > output.txt

lynx23

10 months ago

vim-ai basically supports this use case out of the box. All you need is your a index file listing all the files you want included, starting with

>>> include

guidedlight

10 months ago

This is probably very useful for use with LLM’s.

istvanmeszaros

10 months ago

Love the name :D.

thih9

10 months ago

> A vomitorium is a passage situated below or behind a tier of seats in an amphitheatre or a stadium through which large crowds can exit rapidly at the end of an event.

> A commonly held but erroneous notion is that Ancient Romans designated spaces called vomitoria for the purpose of literal vomiting, as part of a binge-and-purge cycle

https://en.wikipedia.org/wiki/Vomitorium

langcss

10 months ago

lelandfe

10 months ago

In which the thing being spewed is people

ziofill

10 months ago

the .sick file extension is a nice touch ^^

yakshaving_jgt

10 months ago

…although historically inaccurate.

donw

10 months ago

Be careful with the name, McDonald’s might sue you for copyright infringement.

inciampati

10 months ago

find ... | xargs head -n -0

frereubu

10 months ago

The name links up nicely with AI enshittification. Although if you wanted to be pedantic, for that metaphor to work you'd really want to call it "gorge" or something more related to ingestion rather than vomiting. (I'm aware that a vomitorium was the exit from a Roman stadium, so it's not really about throwing up either).

dangsux

10 months ago

[dead]

Haskell4life

10 months ago

[flagged]