DeepSeek-AI Released DeepSeek-Prover-V2: An Open-Source Large Language Model Designed for Formal Theorem, Proving through Subgoal Decomposition and Reinforcement Learning

1 4 minutes read

1746139557 DeepSeek AI Released DeepSeek Prover V2 An Open Source Large Language Model Designed for.png

The development of official sports thinking into a specialized sub -field of artificial intelligence requires a strict logical consistency. Contrary to the solution to informal problems, which allows intuition and the specific intuition in a loose manner, the formal theory that is fixed on each step is fully described and accurate and calculator can be verified. Evidence assistants, such as Lean, Coq and Isabelle, are the structural frameworks through which these official evidence is built. Its operation requires logical safety without space for omission, approximation, or unannounced assumptions. This makes the challenge especially require artificial intelligence systems, especially large language models, which excel in producing coherent natural language responses, but usually lacks accuracy to produce officially verified evidence. However, the desire to mix these strengths, the fluency of artificial intelligence in informal thinking and formal verification structure led to new innovations in the interface of language modeling and official logical automation.

A major problem arises from the inability of the current language models to bridge the conceptual gap between informal and official thinking. Language models usually excel in generating human -like interpretations and solving mathematics problems written in the natural language. However, this logic is informal by nature and often lacks the structural accuracy required by official logical systems. While humans can jump interviewed from one conclusive step to another, the assistants are proven to have a fully specific sequence of steps, free of mystery. Consequently, the challenge is to direct artificial intelligence models to produce logical coherent outputs of informal and intuitive internal thinking. This problem becomes increasingly complicated when dealing with advanced theories of areas such as numbers or engineering, where the accuracy is decisive.

The recent efforts tried to address this problem by directing models first to create natural language proof drawings, which are translated manually or almost automatic into formal proof steps. The well -known strategy includes a complex theory decomposition into smaller sub -officers. All Lemma subclips can be processed independently and then combined to form a full guide. The frameworks such as “Draft, Sketch and Prove” applied this idea, using language models to create the outlines that are then translated into an official language. Another method of learning hierarchical reinforcement is used, which breaks down complex mathematical problems into simpler layers. However, these models are often struggled to produce fully verified outputs in Lean or COQ environments. Moreover, training data for these models is usually limited, and attempts to demonstrate successful results that provide useful educational signals.

A team of researchers from Deepseek-EAI introduced a new model, Deepseek-PROVER-V2, designed to generate official sporting evidence by taking advantage of the sub-decomposition and learning to reinforce. The essence of their relatives is used to destroy a complex theory into controlled sub-films, each of which is translated into a “Have” statement in Lean 4 with a deputy element indicating that the guide is incomplete. Then these sub -sections are passed to the 7 -size model that complements each proof step. Once all the steps are solved, they are manufactured in a fully meager guide and are associated with the original natural logic created by Deepseek-V3. This constitutes a set of data rich in cold start learning reinforcement. More importantly, the model training is fully equipped with artificial data, with no evidence for the human being used.

The cold start pipeline begins by paying Deepseek-V3 to create proof of natural language. These drawings are converted into official theoretical data with parts that have not been resolved. The main innovation lies in a frequent solution of each branch using the Prover 7B, which reduces account costs while maintaining formal accuracy. The researchers built the curriculum learning framework that made the complexity of training tasks over time. They also implemented two types of sub -theories, one that includes the previous sub -superpackers as buildings, and one of them independently. This dual structure was included in the expert repetition stage of the model to train it on the most difficult problems gradually. Then the model’s ability was strengthened through the consistency -based reward system, ensuring that all decomposing lemon in the final official guide.

On the MINIF2F test standard, the model has a pass rate of 88.9 % with high samples (8192 pass), compared to 82.0 % by Kimina-Proving and 64.7 % by Geodel-Proving. 49 out of 658 problems of Putnambench, a platform characterized by difficult sporting tasks. On the newly introduced Proverbench data collection, which includes 325 official problems, the 6 out of 15 cases of AIME (American Mathematics Exam) for 2024 and 2025. These standards highlight the ability to generalize the model through multiple official complexity tasks. Even when compared to Deepseek-V3, which employs natural thinking, the new model shows competitive performance, as it solves a similar number of AIME problems while ensuring official verification.

Many major meals of search on Deepseeek-PROVER-V2:

Deepsek-PROVER-V2 achieved a 88.9 % pass rate on the Minif2F test (Pass@8192), which is the highest level among official thinking models so far.
The model has succeeded in solving 49 out of 658 problems from the Putnambench data collection, which contains advanced sporting challenges.
I have dealt with 6 out of 15 problems of the last AIME 2024-2025 competitions, and offered the real world application.
A new standard, Proverbench, which includes 325 official problems, has been presented to assess official thinking models.
The pipeline unifies a guide for natural language and official construction by combining Deepseek-V3 and the Prover 7B model.
Two types of sub-decomposition-one with one without dependent buildings-were used to train the model in an organized manner directed to the curricula.
Learning reinforcement with a reward based on consistency greatly improved accurately evidence by imposing structural alignment between drawing and solution.
The entire training strategy depends on the cold artificial starting data, eliminating dependence on the manual proofs.

Check the form on the paper page and GitHub. Also, do not forget to follow us twitter And join us Telegram channel and LinkedIn GrOup. Don’t forget to join 90k+ ml subreddit.

🔥 [Register Now] The virtual Minicon Conference on Agency AI: Free Registration + attendance Certificate + 4 hours short (May 21, 9 am- Pacific time)

Asif Razzaq is the CEO of Marktechpost Media Inc .. As a pioneer and vision engineer, ASIF is committed to harnessing the potential of artificial intelligence for social goodness. His last endeavor is to launch the artificial intelligence platform, Marktechpost, which highlights its in -depth coverage of machine learning and deep learning news, which is technically sound and can be easily understood by a wide audience. The platform is proud of more than 2 million monthly views, which shows its popularity among the masses.