hackernews client

Show HN: VAM Seek – 2D video navigation grid, 15KB, zero server load

42 pointsposted a month ago

22 Comments

Scaevolus

a month ago

Client-side frame extraction is far too slow to be usable for large volumes of data.

You want to precompute the contact sheets and serve them to users. You can encode them with VP9, mux to IVF format, and use the WebCodec API to decode them in the browser (2000B-3000B per 240x135 frame, so ~3MB/hour for a thumbnail every 4 seconds). Alternatively, you can make the contact sheets with JPEG, but there are dimension restrictions, reflow is slightly fiddly, and it doesn't exploit intra-frame compression.

I made a simple Python/Flask utility for lossless cutting that uses this to present a giant contact sheet to quickly select portions of a video to extract.

haasiy

a month ago

Actually, I started with the precomputing approach you mentioned. But I realized that for many users, setting up a backend to process videos or managing pre-generated assets is a huge barrier.

I purposely pivoted to 100% client-side extraction to achieve zero server load and a one-line integration. While it has limits with massive data, the 'plug-and-play' nature is the core value of VAM-Seek. I'd rather give people a tool they can use in 5 seconds than a high-performance system that requires 5 minutes of server config.

fc417fc802

a month ago

> All frame extraction happens client-side via canvas – no server processing, no pre-generated thumbnails.

Doesn't that mean the client has to grab a bunch of extra data when it first opens the page, at least if the user calls up the seek feature? Since you effectively have to grab various frames from all throughout the video to generate the initial batch. It seems like it would make more sense to have server side thumbnails here as long as they're reasonably sparse and low quality.

Although I admit that one line client side integration is quite compelling.

haasiy

a month ago

Exactly. I view this cache similarly to how a browser (or Google Image Search) caches thumbnails locally. Since I'm only storing small Canvas elements, the memory footprint is much smaller than the video itself. To keep it sustainable, I'm planning to implement a trigger to clear the cache whenever the video source changes, ensuring the client's memory stays fresh.

haasiy

a month ago

I’ve read all your feedback, and I appreciate the different perspectives.

To be honest, I struggled a lot with how to build this. I have deep respect for professional craftsmanship, yet I chose a path that involved a deep collaboration with AI.

I wrote down my internal conflict and the journey of how VAM-Seek came to be in this personal log. I’d be honored if you could read it and see what I was feeling during the process: https://haasiy.main.jp/note/blog/llm-coding-journey.html

It’s just a record of one developer trying to find a way forward.

Anthony76

a month ago

There’s a recurring pattern where tools fail not because they’re inefficient at scale, but because the first 30 seconds feel heavy or intimidating. Zero-config, zero-backend approaches often win simply by lowering that initial cognitive cost.

We’ve seen similar dynamics while working on long-term adoption and visibility for developer-facing Web3 tools at AixBoost.com — the solutions that feel instantly usable tend to compound trust faster than technically “better” setups with more friction.

Curious if you’ve thought about a hybrid mode later on, or if keeping the mental model simple is the core philosophy long-term.

user

a month ago

[deleted]

haasiy

a month ago

Greetings from Japan! Thank you for the insightful feedback.

I’m a Japanese developer with a 30-year design background. My English isn't perfect, but I want to share my vision.

The "hybrid mode" I mentioned is still my core philosophy for the future, not fully implemented yet. However, even now, VAM Seek can display thumbnails quite fast by processing everything in the browser. My goal is to make this even smoother.

I’m not trying to replace the 1D bar—it's part of our "muscle memory." In the future, I want VAM Seek to act like a silent assistant: building a cache in the background so that the 2D grid appears instantly the moment you need it.

Invisible until it's indispensable. That’s the "missing standard" I'm aiming for.

Anthony76

a month ago

The “silent assistant” framing resonates a lot. We’ve seen similar dynamics while working on long-term adoption and visibility for developer-facing products at AixBoost — the solutions that quietly build trust in the background often end up feeling indispensable without users ever consciously noticing when that shift happened.

Making the 2D grid appear exactly at the moment it’s needed, without asking the user to think about it, really does feel like a missing standard rather than a feature.

dotancohen

a month ago

This looks absolutely terrific if it is performative. How long does this library take to generate the thumbnails and the seek bar for e.g. a 60 minute video, on 8-year-old desktop hardware? Or on older mobile devices? For reference, my current desktop is from 2012.

haasiy

a month ago

Love the setup! A 2012 machine is a classic.

To answer your question: VAM-Seek doesn't pre-render the entire 60 minutes. It only extracts frames for the visible grid (e.g., 24-48 thumbnails) using the browser's hardware acceleration via Canvas.

On older hardware, the bottleneck is usually the browser's video seeking speed, not the generation itself. Even on a 2012 desktop, it should populate the grid in a few seconds. If it takes longer... well, that might be your PC's way of asking for a retirement plan! ;)

littlestymaar

a month ago

The idea is very compelling, it solves a real use-case. I will definitely take inspiration from that.

However, the execution is meh. The UX is terrible (on mobile at least) and the code and documentation are an overly verbose mess. The entire project ought to fit in the size of the AI generated readme. Using AI for exploration and prototyping is fine, but you can't ship that slop mate, you need to do the polishing yourself.

haasiy

a month ago

I intentionally used AI to draft the README so it's optimized for other AI tools to consume. My priority wasn't 'polishing' for human aesthetics, but rather hitting the 15KB limit and ensuring 100% client-side execution. I'd rather spend my time shipping the next feature than formatting text.

littlestymaar

a month ago

First, you're misunderstanding what I mean by “polishing”, I'm talking about making sure it actually works.

Then, improving the signal to noise ratio of your project actually help “shipping the next feature”, as LLM themselves get lost in the noise they make.

Finally, if you want people to use your project, you need to show us that it's better than what they can make by themselves. And it's especially true now that AI reduces the cost of building new stuff. If you can't work with Claude to build something better that what Claude builds, your project isn't worth more than its token count.

haasiy

a month ago

I have to stand my ground here. Reducing a complex functionality into 15KB is not just about 'generating code'—it's about an architecture that AI cannot conceive on its own.

My role was to architect the bridge between UI/UX design and the underlying video data processing. Handling frame extraction via Canvas, managing memory, and ensuring a seamless seek experience without any backend support requires a deep understanding of how these layers interact.

Simply connecting a backend to a UI might be common, but eliminating the backend entirely while maintaining the utility is a high-level engineering choice. AI was my hammer, but I was the one who designed the bridge. To say this is worth no more than its token count ignores the most difficult part: the intent and the structural simplification that makes it usable for others in a single line of code.

littlestymaar

a month ago

> Reducing a complex functionality into 15KB is not just about 'generating code'—it's about an architecture that AI cannot conceive on its own.

Ironic.

haasiy

a month ago

The discussion here has been helpful for technical nuance. For those interested in the practical adoption and impact, a startup outlet covered the project's approach and real-world use-case for SaaS platforms here: https://ecosistemastartup.com/vam-seek-navegacion-visual-2d-...

littlestymaar

a month ago

Bad bot.

haasiy

24 days ago

Your point was certainly valid. I reviewed the Readme and reexamined the code. I would greatly appreciate it if you could evaluate it again with your own eyes.

user

a month ago

[deleted]

haasiy

a month ago

なるほど、そこを引き合いに出すということはオレの反論に返す言葉が無いってことか？ Aiの言葉がトークンがコードが、どうこう言うからあえて母語で伝えてやるよ。しょうもないクレームにAIを使うのは今や世界標準だぜ？皮肉にもｗ

user

a month ago

[deleted]