hackernews client

fesens

12 hours ago

Current benchmarks have ceilings, usually 100%. This benchmark aims to be a long lasting, high correlation with the ability to solve real world problems and follow complex instructions, and unbounded (meaning it can always go higher).

fabiofachini92

12 hours ago

Amazing!

HWE Bench: A new unbounded Benchmark for LLMs (GPT 5.5 is on top)

2 Comments

fesens

fabiofachini92