I indexed 669 GB of my GoPro videos using my M1 Max computer and local ML models

279 pointsposted 10 hours ago
by iliashad

Item id: 48528029

65 Comments

asenna

7 hours ago

Funny this is almost EXACTLY what I did a few days ago on the same machine using very similar techniques and was on the front-page of HN as well:

https://news.ycombinator.com/item?id=48222733 https://blog.simbastack.com/indexed-a-year-of-video-locally/

I wasn't familiar with your project though, interesting stuff.

I'm trying to add more photography related features to Framedex but yeah there's so much we can do locally, exciting times.

iliashad

3 hours ago

That's great, I checked your article when it was in front page because someone mentioned my project in the comments.

Good job for the article and the project. That's great, yes local models are getting better and better

esjeon

3 hours ago

> Then, run the frame analysis pipeline, which will divide the video into separate video scenes (1s each, or 1fps) > (…) > Frames analyzed 57,537

Aha, it makes total sense. This number sounds much more reasonable than “669 GB”, since the actual total size of processed frames would be like 10-30 GB.

(Not downplaying anything. Doing-at-home always requires some math on practicality)

> Total compute time 67h 40m 42s

I’m just curious tho — is there any paying options that can accelerate this kind of process? Just spin up GPU instances?

iliashad

3 hours ago

> Aha, it makes total sense. This number sounds much more reasonable than “669 GB”, since the actual total size of processed frames would be like 10-30 GB.

The reason why is “669 GB” is the total raw footage size when I'm doing the video processing, I downscaled each frame to 720p to make the video processing much faster and I don't need full original quality in order to get accurate results (as far as I know and experiment with).

> I’m just curious tho — is there any paying options that can accelerate this kind of process? Just spin up GPU instances?

For now, I found that NVIDIA GPU for example RTX 3060 with 12GB Vram was much faster than my M1 Max. (still working on optimizing for speed and accuracy).

justinram11

4 hours ago

Something I've enjoyed more than I expected is Google and Apple photos sending me photo memories and compilations of various things in my life and my kids lives over the last decade.

I'm really bullish on taking more video of my kids, with the thought that it will become easier and easier for AI to put them into little compilations I can enjoy later.

goodmythical

an hour ago

You don't mind Google using your kids to train their models and advertising algorithms?

Years from now they'll be getting "hey look at BIKE BRANDS' NEWEST CHEAP BIKE REMEMBER WHEN YOU USED TO RIDE BIKE BRAND BIKES"

satvikpendem

34 minutes ago

I think most people really don't care, and/or will just adblock those sorts of things when they do arrive.

JMiao

3 hours ago

do you use android and ios, or is there another benefit to having personal media with both?

iliashad

3 hours ago

Can you please elaborate more?

robrain

7 hours ago

DaVinci 21 has indexing built-in (AI IntelliSearch). Not to diminish the work you did, but this is now available to many users (probably only Studio users since it has AI in the name)

iliashad

7 hours ago

Yes, I didn’t look at it. But does it upload your videos to the cloud or process them locally? And does it allow to provide custom faces data to help labeling faces in your videos ?

I think Adobe premiere pro have it as well but cloud processed

GreenSalem

17 minutes ago

A lawyer I know who specialises in rape, and is excellent at getting the obviously guilty exonerated, lost a case last year because of GoPro videos.

Her client was recording while committing the abhorrent crime. The criminal would otherwise have got off.

From my perspective, the GoPro camera produced a good outcome. Still, one has wonder why anyone to record their criminal actions.

Beijinger

10 hours ago

Does it work for porn collections too?

pduggishetti

9 hours ago

You'll need a lora for this, porn content rejection is heavy. Or you'll need a abliterated model, not sure if vision also works.

You might want to add something like yolo finetune to detect scenes + face recognition too.

vorticalbox

7 hours ago

Vision still works perfectly fine in abliterated models.

pduggishetti

6 hours ago

Never tried any of this for porn, just speaking out how I would go about it tbh!

dotancohen

5 hours ago

For GP's purpose, can face recognition techniques be repurposed for, um, other body parts recognition? Sometimes the actresses are facing away from camera. There are exposed lips, if that helps.

iliashad

7 hours ago

Why it’s always the same question? Hahah. I posted my project over Reddit and I got the same one hahah

lifestyleguru

9 hours ago

Last time I tried whisper, it hallucinated an elaborate conversation from sounds of slapping and moaning and it took minutes to spit every single line of it.

3eb7988a1663

8 hours ago

Parakeet has been trained to detect non-voice sounds and exclude that from identification, so you might have better luck with that family.

dotancohen

5 hours ago

If I remember correctly, the whisper documentation actually recommends to trim non-speech portions as the models halucinate heavily during those portions.

supertroop

9 hours ago

Not sure if you’re being sarcastic but I think this is an interesting question. Would deep seek be useful here since it is local?

fibers

5 hours ago

just because it is local does not mean it wouldn't reject explicit content. you can definitely try and find abilated models and can attempt to use unsloth or something similar to tune it properly.

okr

5 hours ago

Depends how deep you wanna go.

WarOnPrivacy

8 hours ago

I was surprised to learn that the

    M1 Max CPU is an ARM/SoC, comparable to an 11th gen Intel i9
Do I have it right? Would Windows ARM performance be similar for those cpu?

ref: https://www.cpubenchmark.net/compare/4585vs4245/Apple-M1-Max...

pachouli-please

7 hours ago

It's also a bit apples (heh) to oranges for a handful of reasons, but most impactful

- "unified" ram makes all the system ram available as VRAM - dedicated ai coaccelerator thingy

Both of these reasons allow the apple silicon chips to crush conventional cpus in these kind of AI model workload stuffs

No idea about what the windows arm stuff is capable of. I know they use Qualcomm snapdragon chips though.

voidmain0001

an hour ago

No comparison. M1 Max has 400GB/s RAM bandwidth while Snapdragon X2 Elite, the latest and greatest , has 228GB/s RAM bandwidth.

owldown

7 hours ago

“Comparable” is maybe true if we are talking about single core performance, but for memory bandwidth, the M1 Max is about 8 times faster. Wider bus, lower latency, not even close.

iliashad

7 hours ago

To your question, I can’t deny or confirm that because I didn’t tried it this project over a Windows machine yet or a machine with this config

asdfasgasdgasdg

2 hours ago

Cool build but the example videos you provide at the end are . . . not what I would hope for when thinking about the highlights of 2000+ videos of biking? For example the dog barking video only has one scene repeated two or three times and it's five seconds long?

iliashad

2 hours ago

Fair enough, what would like to see as an example video and I would make it.

For the dog barking videos, those are only the video scenes that I have a dog barking sound in the video.

I'll keep adding more prompts and example videos, keep an eye for that

asdfasgasdgasdg

2 hours ago

I don't have any preconceptions about specific content I want to see. I'd just think that so many hours of such cool adventures would have greater variety. It made me wonder if your AI really did such a good job of indexing it. It made me think maybe the tech isn't quite ready yet?

Did you ever visit crazyguyonabike.com? A long time ago I had the pleasure of following the journey of a friend of a friend of a friend on that site:

https://www.crazyguyonabike.com/doc/?doc_id=2405

Stuff like that I guess?

cake-rusk

5 hours ago

I have an RTX 5090 card but it only has 32 GB RAM, can something like this work on my machine?

iliashad

3 hours ago

Yes, and it’ll result in much faster results than the ones that I did with my computer

tontonius

6 hours ago

if anyone is interested in searching large video collections local and offline I suggest taking a look at Jumper https://docs.getjumper.io

comes with some nifty features like NLE- integrations, people search, MCP, API etc

Disclaimer: one of the co-founders

dotancohen

5 hours ago

The link just timed out for me. I'm in Israel, connecting via residential WiFi. All other sites that I regularly use connect just fine.

fl0id

8 hours ago

it is possible to use apple gpu with containers. either with podman + runkit + recent mesa or with recent vllm-metal from docker https://www.docker.com/blog/docker-model-runner-vllm-metal-m...

iliashad

7 hours ago

I was looking for a solution for this issue of running docker containers over MPS and utilizing their GPU power. I think this project will be the solution for it, I’ll try it very soon and add support for it. Thank you, much appreciated

WhitneyLand

7 hours ago

I’d like to see embedding of actual video clips become practical in this type of workflow.

Frame level embedding it covering a lot, but can miss out on a lot of action related searches.

synergy20

an hour ago

can vlm be used instead or it's too heavy and slow

PreownedPlaid

2 hours ago

this is really cool. was looking to do something similar on mbp 64gb

iliashad

an hour ago

That's really great, thank you!

rho138

10 hours ago

This would fit most best as a “Show HN:” post :)

culi

9 hours ago

The title should link to the "full article". I wonder if OP's domain name is banned or something and they're doing this to get around it

iliashad

8 hours ago

I tried to edit it and add Show HN, but it doesn't show the edited version. Thank you!

iliashad

8 hours ago

I would love your feedback and suggestions for new improvements or features you wanna have, either in the source available version, the desktop app or blog post itself?

m3kw9

7 hours ago

Grab frames, lower res, classify, combine meta data. Write to sql

iliashad

7 hours ago

Not really. Grab frames, lower res, classify, combine metadata, transcribe the audio, convert those data (text, visual and audio) to embedding, save them over a vector DB and SQL DB. Which helped me to do semantic search, RAG, search using a screenshot of the video to find the exact the moment in the video plus search using an audio file as well. And other features unlocked with vector DB

ingvay7

5 hours ago

Really cool work and workflow. strongly prefer this kind of local, open pipeline that i control over a dependency on Adobe tools and lock ins.

iliashad

3 hours ago

I agree with that, thank you for your feedback. Also, maybe you're not a video editor and you just wanna search your videos. The video editing integrations are optional and you have full control. You can switch between Adobe Premiere Pro, Final cut Pro or Davinci Resolve

ingvay7

2 hours ago

cannot wait to incorporate this to my workflow. thanks

nyxtom

6 hours ago

Now this ^^ is an awesome use case!

iliashad

3 hours ago

Thank you, would like to know your use case for this kind of project and which prompt you want to genearte ?

Mawr

an hour ago

> Many of the videos I captured amazing moments, and sometimes it's kind of hard to watch the full videos to get those moments.

Yep. I had the same problem.

> Then, run the frame analysis pipeline [...] I have a face recognition plugin using my custom faces data, object detection, on-screen text, shot type, and scene description [...] we will have three vector DB collections that have all the information about our videos, like video location metadata, camera name, faces recognized, objects detected, on-screen text, transcription, description of each scene, and many more [...] we can get better indexed data if you use the advanced mode indexing to use the Qwen2.5-VL-7B-Instruct model to understand and describe your video much better, but at a slower indexing speed

Yeah, uhm... ok :)

If anyone else has a similar problem, the real solution is as follows:

1. When recording, if you witness an interesting moment worth saving later, press the power button — this will mark the current moment in the video as a chapter.

2. Find the chapters later when editing and cut them into clips.

3. You're done :)

This has two main benefits over the insanity above:

1. It's trivially simple instead of insanely complex and inefficient.

2. It will reliably catch all the stuff you find interesting, since you're the one doing the marking.

The downsides:

1. Doesn't work retroactively.

2. It may miss interesting stuff if you miss it at the time as well.

3. Only works for this use case.

4. Nerds won't salivate over your usage of cutting edge tech.