Quoting Shunyu Yao
May 17, 2025
Shunyu Yao, a researcher from OpenAI who worked on Deep Research, makes the case for fundamentally altering our approach to benchmarking now we’re in “the second half”:
Inertia is natural, but here is the problem. AI has beat world champions at chess and Go, surpassed most humans on SAT and bar exams, and reached gold medal level on IOI and IMO. But the world hasn’t changed much, at least judged by economics and GDP.
I call this the utility problem, and deem it the most important problem for AI.
Perhaps we will solve the utility problem pretty soon, perhaps not. Either way, the root cause of this problem might be deceptively simple: our evaluation setups are different from real-world setups in many basic ways.