hackernews client

storystarling

15 days ago

Matches my experience trying to stabilize long LangGraph workflows. The regex checks are fine for formatting but miss the semantic drift that happens when you're actually injecting context. The rubric-based approach makes sense, but I'm not sure how a bootstrapped team implements this without the human labeling budget. I've tried using a stronger model to grade the outputs, but the latency overhead is brutal.

Evolving Instruction Following Beyond IFEval and "Avoid the Letter C"

1 Comments

storystarling