OpenAI employees publicly accuse Grok3's Benchmark test results of being misleading

GoldenOctober2024

2025-02-23 02:44:46

Abstract generation in progress

Jinshi Data News on February 23rd, recently, an employee of OpenAI publicly accused XAI, a company owned by Elon Musk, of releasing misleading Benchmark test results for its latest AI model, Grok3. In response, Igor Babushkin, co-founder of XAI, insisted that the company did nothing wrong. XAI's charts show that both versions of Grok3 - Grok3 Reasoning Beta and Grok3 mini Reasoning - outperformed OpenAI's current strongest available model, o3-mini-high, on AIME 2025. However, OpenAI employees quickly pointed out on the X platform that XAI's charts did not include the AIME 2025 score of o3-mini-high under the 'cons@64' condition. Babushkin argued on the X platform that OpenAI had also released similar misleading Benchmark test charts in the past, although these charts were used to compare the performance of their own models.

GROK-6.72%

XAI-11.02%

View Original

The content is for reference only, not a solicitation or offer. No investment, tax, or legal advice provided. See Disclaimer for more risks disclosure.

3 Likes