Curriculum-based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal Small Language Models

View the PDF file from the paper entitled Infi-MMR: based on the curricula based on multimedia curricula through gradual reinforcement learning in small multimedia language models, by Zeyu Liu and 11 other authors
PDF HTML (experimental) view
a summary:Recent developments in LLMS models (LLMS) showed great progress in thinking capabilities, such as Deepseek-R1, which enhances the learning of the rules-based reinforcement to significantly enhance logical thinking. However, the expansion of these accomplishments to the large multimedas models (MLLMS) represents critical challenges, which are more clear in the models of small multimedia language (MSLMS) given its weakest capabilities on the basis of capabilities: (1) the scarcity of risk that consists of risks. Reinforcement learning may result from complex and incorrect thinking. To face these challenges, we design a new Infi-MMR framework to cancel the logical thinking capabilities of MSLMS through a carefully organized three-stage curriculum and suggest our Infi-MMR-3B curriculum. The first phase, and the activation of foundation logic, benefits from high -quality textual thinking data groups to stimulate and enhance the possibilities of logical thinking of the model. The second stage, which is the adaptation of media thinking, uses multimedia data that has been activated to name to facilitate the gradual transportation of thinking skills to multimedia contexts. The third stage, which is to enhance multimedia thinking, uses multimedia data sponsored and free from the illustrations to reduce linguistic biases and enhance strong media thinking. Infi-MMR-3B achieves both the ability to think about the latest Mathematics model (43.68 % on Mathperte Testmini, 27.04 % in Mathvision test, 21.33 % on Olympiadbench) and general thinking capacity (67.2 % on Mathvista Testmini). Resources are available in this URL https.
The application date
From: Zio Liu [view email]
[v1]
Thursday, May 29, 2025 04:51:56 UTC (3,498 KB)
[v2]
Fri, 6 June 2025 08:03:09 UTC (3,666 KB)
Don’t miss more hot News like this! AI/" target="_blank" rel="noopener">Click here to discover the latest in AI news!
2025-06-09 04:00:00