AI

How Far Are (M)LLMs from Humans in the Latest High School Physics Olympiad Benchmark?

Authors:Fangchen Yu, HaIiyuan Wan, Qianjia Cheng, Yuchen Zhang, Jiacheng Chen, Fujun Han, Yulun Wu, Junchi Yao, Ruilizhen Hu, Ning Ding, Yu Cheng, Tao Chen, Lei Bai, Dongzhan Zho

View the PDF file from the paper entitled HipHo: To what extent (M) LLMS from humans in the latest highlight of high school physics?

PDF HTML (experimental) view

a summary:Recently, the material capabilities of (M) LLMS have increased attention. However, the current standards of physics suffer from major gaps: they do not provide systematic and modern coverage of physics competitions in the real world such as the Physics Olympics, and do not allow comparing direct performance with humans. To fill these gaps, we offer Hipho, the first criterion dedicated to high school high school physics with a human alignment assessment. Specifically, Hipho highlights three main innovations. (1) Comprehensive data: 13 last of the Olympics exams collects 2024-2025, extends both international and regional competitions, and covers mixed methods that include problems that extend to the text only to the plan. (2) Professional evaluation: We adopt plans to put the official signs to perform accurate degrees on both the answer and the step, and are completely in line with the human examiners to ensure high -quality evaluation and field. (3) Compared to the human contestants: We allocate gold, silver and bronze medals for models based on official medal thresholds, thus enabling the direct comparison between (M) LLMS and human contestants. Our rating on a large scale for 30 of the latest technologies (M) LLMS: through 13 exams, MLLMS is often open source at the bronze level or below; Open source LLMS appears promising with many gold; The closed mlms can achieve the source from 6 to 12 gold medals; Most models still have a large gap of complete signs. These results shed light on the performance gap between open source models and adults, strong thinking capabilities for closed source models, and the remaining room for improvement. Hipho, an Olympic Standard Approval of Man for Multimedia Fish, Open Source at the URL HTTPS with the public top leaders in this URL https.

The application date

From: Vangen Yu [view email]
[v1]

Tuesday, 9 Sep 2025 16:24:51 UTC (6,323 KB)
[v2]

Wed, Sep 10, 2025 11:05:31 UTC (6,323 KB)
[v3]

Friday, 12 Sep 2025 18:16:53 UTC (6,324 KB)

Don’t miss more hot News like this! AI/" target="_blank" rel="noopener">Click here to discover the latest in AI news!

2025-09-16 04:00:00

Related Articles

Back to top button