[2412.06845] 7B Fully Open Source Moxin-LLM/VLM — From Pretraining to GRPO-based Reinforcement Learning Enhancement

1 2 minutes read

241206845 7B Fully Open Source Moxin LLMVLM From Pretraining to.png

[Submitted on 8 Dec 2024 (v1), last revised 11 Jun 2025 (this version, v5)]

Authors:Pu Zhao, Xuan Shen, Zhenglun Kong, Yixin Shen, Sung-hee Chang, Timothy Rupprecht, Lei Lu, Enfu Nan, Changdi Yang, Yuemei He, Weiyan Shi, Xingchen Xu, Yu Huang, Wei Jiang, Wei Wang,

View the PDF file from the paper entitled openly open source Moxin-LLM/VLM-from the improvement of reinforcement based on GRPO, by PU Zhao and 17 other books

PDF HTML (experimental) view

a summary:Recently, LLMS models have been subjected to a large transformation, characterized by a rapid rise in both popularity and capabilities. The leadership of this development is the private LLMS such as GPT-4 and GPT-O1, which has acquired widespread attention in the artificial intelligence community because of its wonderful performance and diversity. At the same time, open source LLMS, such as Llama, has made great contributions to increased popularity from LLMS due to the ease of customization and publication of models through various applications. Although open source LLMS offers unprecedented opportunities for innovation and research, LLMS marketing raised concerns about transparency, reproduction and safety. Many open source Llms fail to meet the basic transparency requirements by blocking basic ingredients such as training and data code, which may hinder more innovations on LLMS. To mitigate this problem, we offer Moxin 7B, which is completely developed LLM, and is committed to the principles of open science, open source, open data, and open access. We export the pre -training and training code, training data groups, points control, and intermediate and final inspection points, with the aim of providing continuous obligations to the fully open source LLMS. After preparing the basic model in advance, we secure the McCain base model with the SOTA framework after training and education to obtain the Moxin Tool model. To improve the possibility of thinking, we do more typical nutrition with the DEPSEK R1 series, then we use the group’s relative policy (GRPO) after Deepseek R1 to achieve our model, which leads to the Melissine thinking model. Moreover, we develop our vision language model based on our Mixin model. Experiments show that our models achieve a superior performance in various assessments such as zero evaluation, a few snapshots, and Cot assessment.