Show HN: VAM Seek – 2D video navigation grid, 15KB, zero server load

39 pointsposted 17 hours ago
by haasiy

13 Comments

Scaevolus

10 hours ago

Client-side frame extraction is far too slow to be usable for large volumes of data.

You want to precompute the contact sheets and serve them to users. You can encode them with VP9, mux to IVF format, and use the WebCodec API to decode them in the browser (2000B-3000B per 240x135 frame, so ~3MB/hour for a thumbnail every 4 seconds). Alternatively, you can make the contact sheets with JPEG, but there are dimension restrictions, reflow is slightly fiddly, and it doesn't exploit intra-frame compression.

I made a simple Python/Flask utility for lossless cutting that uses this to present a giant contact sheet to quickly select portions of a video to extract.

haasiy

5 hours ago

Actually, I started with the precomputing approach you mentioned. But I realized that for many users, setting up a backend to process videos or managing pre-generated assets is a huge barrier.

I purposely pivoted to 100% client-side extraction to achieve zero server load and a one-line integration. While it has limits with massive data, the 'plug-and-play' nature is the core value of VAM-Seek. I'd rather give people a tool they can use in 5 seconds than a high-performance system that requires 5 minutes of server config.

fc417fc802

10 hours ago

> All frame extraction happens client-side via canvas – no server processing, no pre-generated thumbnails.

Doesn't that mean the client has to grab a bunch of extra data when it first opens the page, at least if the user calls up the seek feature? Since you effectively have to grab various frames from all throughout the video to generate the initial batch. It seems like it would make more sense to have server side thumbnails here as long as they're reasonably sparse and low quality.

Although I admit that one line client side integration is quite compelling.

haasiy

5 hours ago

Exactly. I view this cache similarly to how a browser (or Google Image Search) caches thumbnails locally. Since I'm only storing small Canvas elements, the memory footprint is much smaller than the video itself. To keep it sustainable, I'm planning to implement a trigger to clear the cache whenever the video source changes, ensuring the client's memory stays fresh.

haasiy

5 hours ago

I’ve read all your feedback, and I appreciate the different perspectives.

To be honest, I struggled a lot with how to build this. I have deep respect for professional craftsmanship, yet I chose a path that involved a deep collaboration with AI.

I wrote down my internal conflict and the journey of how VAM-Seek came to be in this personal log. I’d be honored if you could read it and see what I was feeling during the process: https://haasiy.main.jp/note/blog/llm-coding-journey.html

It’s just a record of one developer trying to find a way forward.

dotancohen

10 hours ago

This looks absolutely terrific if it is performative. How long does this library take to generate the thumbnails and the seek bar for e.g. a 60 minute video, on 8-year-old desktop hardware? Or on older mobile devices? For reference, my current desktop is from 2012.

haasiy

5 hours ago

Love the setup! A 2012 machine is a classic.

To answer your question: VAM-Seek doesn't pre-render the entire 60 minutes. It only extracts frames for the visible grid (e.g., 24-48 thumbnails) using the browser's hardware acceleration via Canvas.

On older hardware, the bottleneck is usually the browser's video seeking speed, not the generation itself. Even on a 2012 desktop, it should populate the grid in a few seconds. If it takes longer... well, that might be your PC's way of asking for a retirement plan! ;)

littlestymaar

12 hours ago

The idea is very compelling, it solves a real use-case. I will definitely take inspiration from that.

However, the execution is meh. The UX is terrible (on mobile at least) and the code and documentation are an overly verbose mess. The entire project ought to fit in the size of the AI generated readme. Using AI for exploration and prototyping is fine, but you can't ship that slop mate, you need to do the polishing yourself.

haasiy

5 hours ago

I intentionally used AI to draft the README so it's optimized for other AI tools to consume. My priority wasn't 'polishing' for human aesthetics, but rather hitting the 15KB limit and ensuring 100% client-side execution. I'd rather spend my time shipping the next feature than formatting text.

littlestymaar

4 hours ago

First, you're misunderstanding what I mean by “polishing”, I'm talking about making sure it actually works.

Then, improving the signal to noise ratio of your project actually help “shipping the next feature”, as LLM themselves get lost in the noise they make.

Finally, if you want people to use your project, you need to show us that it's better than what they can make by themselves. And it's especially true now that AI reduces the cost of building new stuff. If you can't work with Claude to build something better that what Claude builds, your project isn't worth more than its token count.

haasiy

3 hours ago

I have to stand my ground here. Reducing a complex functionality into 15KB is not just about 'generating code'—it's about an architecture that AI cannot conceive on its own.

My role was to architect the bridge between UI/UX design and the underlying video data processing. Handling frame extraction via Canvas, managing memory, and ensuring a seamless seek experience without any backend support requires a deep understanding of how these layers interact.

Simply connecting a backend to a UI might be common, but eliminating the backend entirely while maintaining the utility is a high-level engineering choice. AI was my hammer, but I was the one who designed the bridge. To say this is worth no more than its token count ignores the most difficult part: the intent and the structural simplification that makes it usable for others in a single line of code.

littlestymaar

3 hours ago

> Reducing a complex functionality into 15KB is not just about 'generating code'—it's about an architecture that AI cannot conceive on its own.

Ironic.

haasiy

16 minutes ago

なるほど、そこを引き合いに出すということはオレの反論に返す言葉が無いってことか? Aiの言葉がトークンがコードが、どうこう言うからあえて母語で伝えてやるよ。 しょうもないクレームにAIを使うのは今や世界標準だぜ?皮肉にもw