Curriculum-based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal Small Language Models

0 1 minute read

Curriculum based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal.png

[Submitted on 29 May 2025 (v1), last revised 6 Jun 2025 (this version, v2)]

Authors:Zeyu Liu, Yuhang Liu, Guanghao Zhu, Christi XIE, Zhen Li, Jianbo Yuan, Xinyao Wang, Qing Li, SHING-CI CHEUNG, Shengyu Zhang, Fei Wu, Hongxia Yang

View the PDF file from the paper entitled Infi-MMR: based on the curricula based on multimedia curricula through gradual reinforcement learning in small multimedia language models, by Zeyu Liu and 11 other authors

PDF HTML (experimental) view

a summary:Recent developments in LLMS models (LLMS) showed great progress in thinking capabilities, such as Deepseek-R1, which enhances the learning of the rules-based reinforcement to significantly enhance logical thinking. However, the expansion of these accomplishments to the large multimedas models (MLLMS) represents critical challenges, which are more clear in the models of small multimedia language (MSLMS) given its weakest capabilities on the basis of capabilities: (1) the scarcity of risk that consists of risks. Reinforcement learning may result from complex and incorrect thinking. To face these challenges, we design a new Infi-MMR framework to cancel the logical thinking capabilities of MSLMS through a carefully organized three-stage curriculum and suggest our Infi-MMR-3B curriculum. The first phase, and the activation of foundation logic, benefits from high -quality textual thinking data groups to stimulate and enhance the possibilities of logical thinking of the model. The second stage, which is the adaptation of media thinking, uses multimedia data that has been activated to name to facilitate the gradual transportation of thinking skills to multimedia contexts. The third stage, which is to enhance multimedia thinking, uses multimedia data sponsored and free from the illustrations to reduce linguistic biases and enhance strong media thinking. Infi-MMR-3B achieves both the ability to think about the latest Mathematics model (43.68 % on Mathperte Testmini, 27.04 % in Mathvision test, 21.33 % on Olympiadbench) and general thinking capacity (67.2 % on Mathvista Testmini). Resources are available in this URL https.