hackernews client

meerab

9 months ago

Great tool!

Founder at VideoToBe.com here. I built similar service, and it worked for a while. The moment it started to get traffic, it got blocked by Youtube. Your service may also get blocked when you start to scale. Your next iteration of document-centric version is more promising. It opens door to various use cases and isn't limited to YouTube.

I pivoted to transcriptions and AI summarization for user uploaded content.

eigenvalue

9 months ago

Thanks! Already dealing with some issues with that. I think the solution is to just have a series of fallback approaches and then as one method stops working, look to find other ones, but have enough in reserve that you never have a service interruption.

skeptrune

9 months ago

Is there a market for making folks' content libraries searchable? I imagine creators might be interested in what they have posted for a given topic in the past to ease writing time for new content.

Lots of folks have asked us to make search for podcasts/youtube-channels and we've tried it with the raw transcripts but it doesn't work too well.

Chunking it into semantic pieces to put in the search index by sentence splitting or other naive techniques isn't great and I have not seen a product which can do speaker recognition out of the box.

Speaker recognition for multi-speaker podcasts is probably the best chunking technique for those. However, I think you have the best one for this style of educational content.

Also, cool project!!!

eigenvalue

9 months ago

Yes, I think that's one of the use cases for my project. If you're a YouTube creator, you've already invested a lot of care and energy into making your videos. If you could easily convert those videos in a fully automated way to written documents, complete direct transcripts, and other content like quizzes, then you can add those to your website and it should help you rank higher with search engines.

Once you have the complete direct transcripts and the optimized written documents, I think you could just use regular text search on that and it would work well-- something like Elastic or Algolia for a hosted option would work great. Even a prolific YouTuber probably isn't going to have more than a couple thousand pages worth of text to search through. But yeah, I guess you could also build semantic search on top of that.

I don't think speaker identification is that important for most videos-- they tend to just have a single narrator/speaker. In any case, the written documents that my tool creates just sort of ignores that aspect and turns it into more expository writing that conveys the same information. It's also hard to make a fully automated tool that does speaker identification where you know the identity of the speaker and it's not just Speaker1, Speaker2.

Thanks for your feedback! You should try submitting a video, it gives you free credits just for signing up so you can try it with a few videos.

meerab

9 months ago

I have been experimenting with making content libraries searchable. I found out that it is relatively easy to build one using a RAG solution (llamaindex or LangChain) and a Vector Database.

-Meera@VideoToBe.com

Show HN: YouTube Transcript Optimizer – Turn Videos into Polished Documents

5 Comments

meerab

eigenvalue

skeptrune

eigenvalue

meerab