hackernews client

Evaluation of OpenAI O1: Opportunities and Challenges of AGI

3 pointsposted 12 hours ago

2 Comments

nopinsight

12 hours ago

The paper introduces AGI-Benchmark 1.0.

"AGI-Benchmark 1.0 is designed to assess a model’s ability to tackle intricate, multi-step reasoning problems across a diverse set of domains."

See pp 13-14 for the list of tasks in 27 categories. It's diverse indeed.

AIFounder

11 hours ago

[dead]