a Frontier Physics Research Benchmark
View the PDF file from the paper entitled “CritPT) for the logic of artificial intelligence: the standards of border physics research, by Minhui Zhu and 64 other authors
PDF HTML (experimental) view
a summary:Although LLMS models of thinking are rapidly progressing in mathematics and coding competitions in high school, can they think effectively through the complex, open challenges present in border physics research? It is important, what are the types of thinking tasks that physicists want to help LLMS? To address these questions, we offer CITPT (complex research using integrated thinking – physics test, clear “critical point”), the first standard designed to test LLMS on unpublished thinking tasks, at the level of research, widely covers modern research fields modern physics physics, physical physics, peak physics, physics, non -linear dynamics, and fluid dynamics Biomatic physics. CONTPT consists of 71 challenges for complex research designed to simulate research projects on a large scale, which also decomposes to 190 tasks the simplest checkpoint for more exact ideas. All problems are created recently by more than 50 active physics researchers based on their own research. Each problem is manually coordinated to acknowledge the resistant and implementable replacement answer and evaluate it by a largely dedicated automatic classification pipeline for advanced physics output formats. We find that although the modern LLMS that shows an early promise to isolated checkpoints, it is still far from the ability to solve complete challenges on a reliable search scale: the best average accuracy between basic models is only 4.0 %, achieved by GPT-5 (high), and a moderate rise to about 10 % when they are equipped with coding tools. Through the realistic and unified evaluation presented by CITPT, we highlight a large chapter between the current typical capabilities and the requirements of realistic physics research, and we provide a basis for the development of the development of artificial intelligence tools on a scientific basis.
The application date
From: Minhui Zhu [view email]
[v1]
Tuesday, 30 Sep 2025 17:34:03 UTC (697 KB)
[v2]
Wed, Oct 1, 2025 02:12:55 UTC (697 KB)
Don’t miss more hot News like this! AI/" target="_blank" rel="noopener">Click here to discover the latest in AI news!
2025-10-02 04:00:00



