jlukecarlson
2 hours ago
I appreciate the details shared in this paper but it'd be great if they open sourced their implementation!
2 hours ago
I appreciate the details shared in this paper but it'd be great if they open sourced their implementation!
7 hours ago
Curious if the behaviour driven testing can be done by another LLM agent (or a group of agents) - one LLM agent testing another. Could lead to a self-improving loop?
5 hours ago
5 hours ago
7 hours ago
A powerful move beyond benchmarks — this paper redefines LLM evaluation through realistic, behavior-driven testing.
7 hours ago
Very interesting work.
5 hours ago
Excellent work
5 hours ago
Interesting
7 hours ago
Nice Work
8 hours ago
Nice work
8 hours ago
Great work
8 hours ago
interesting
6 hours ago
[dead]