hackernews client

0x0funky

a month ago

I built this because Meta’s SAM-Audio (Segment Anything for Audio) is a breakthrough for interactive sound separation, but the original implementation is quite heavy, often requiring 30GB+ VRAM due to the default loading of Vision Encoders and Rankers.

The Problem: Beyond the VRAM barrier, the Windows installation is a "dependency hell" due to mismatched FFmpeg and TorchCodec DLLs.

My Approach (The "Lite Mode"):

· Memory Trimming: I modified the model initialization to strip the Vision Encoder and various rerankers for pure audio tasks. This brings the footprint down to ~6GB VRAM for the Small model (bfloat16).

· Automated Setup: Bundled a install.bat that pins compatible versions of PyTorch and FFmpeg to ensure it works on Windows 11 immediately.

· Architecture: Built with a Next.js (Tailwind v4) frontend and a FastAPI/Celery backend to provide a modern interface over the CLI.

Everything is open-source (MIT). I hope this makes professional-grade audio separation accessible to those with consumer-grade hardware like the RTX 3060/4060.

I'm curious to hear from anyone testing this on different GPU architectures!

Show HN: AudioGhost AI – Run Meta's Sam-Audio on Consumer GPUs (4GB-6GB VRAM)

1 Comments

0x0funky