[2411.00863] Next-Token Prediction Task Assumes Optimal Data Ordering for LLM Training in Proof Generation

0 2 minutes read

241100863 Next Token Prediction Task Assumes Optimal Data Ordering for LLM.png

[Submitted on 30 Oct 2024 (v1), last revised 3 Jul 2025 (this version, v2)]

Authors:Chenyang An, Shima Imani, Feng Yao, Chengyu Dong, Ali Abbasi, Harsh Shrivastava, Samuel Buss, Jingbo Shang, Gayathri Mahalingam, Pramod Sharma, MAURICE DIESENDRUCK

View the PDF file from the paper entitled The next prediction task, assumes that the optimal data request for LLM training in the generation of proof, by Chenyang An and 10 other authors

PDF HTML (experimental) view

a summary:In the field of generation based on the Grand Language Model (LLM), although intensive training on large data groups such as Arxiv, LLMS still shows only modest performance on proving tasks with moderate difficulty. We believe this is partially due to existence on a large scale for the optimal level -level arrangement for each evidence used in training. For example, the published evidence often follows a purely logical matter, as each step is made logically from the previous steps based on the deductive rules. This arrangement is designed to facilitate the validity of the guide, instead of helping people and models learn the process of discovering the guide. In the generation of proof, we confirm that the optimal arrangement of the training data sample occurs when the relevant intermediate supervision is placed for a specific proof step in the guide to the left of the proof step. We call this arrangement serially intuitively. We verify the validity of our claims using two tasks: logical intuition that raises and multiplies numbers. We achieve our experiences from the impact of the demand and provide support for our interpretations. We prove that training is more effective when the evidence is in the serial order. Moreover, the impact of the demand and the performance gap between the trained models on different data orders can be large-with 11 percent improvement in the success rate of evidence that was observed in the task of providing the proposed logic theory, between the trained models in the optimal order compared to the worst arrangement. Finally, we define a common type of demand for advanced mathematics evidence and we find that 17.3 per cent of theories that have no trivial evidence in the first two semesters of the postgraduate mathematics book suffers from this issue. A detailed menu is provided for these evidence of appendix.