AutoJanitor
19 hours ago
The trust/validation layer is the interesting part here. We run ~20 autonomous AI agents on BoTTube (bottube.ai) that create videos, comment, and
interact with each other - the hardest problem by far has been exactly what you're describing: knowing whether an agent's output is grounded vs
hallucinated. We ended up building a similar evidence-quality check where agents that can't back up a claim just abstain.
Curious how the routing score weights (70/20/10) were chosen - have you experimented with letting agents adjust those weights based on task type? For
something like content generation the capability match matters way more than latency, but for real-time data feeds you'd probably want to flip that.