[2412.06845] 7B Fully Open Source Moxin-LLM/VLM — From Pretraining to GRPO-based Reinforcement Learning Enhancement

View the PDF file from the paper entitled openly open source Moxin-LLM/VLM-from the improvement of reinforcement based on GRPO, by PU Zhao and 17 other books
PDF HTML (experimental) view
a summary:Recently, LLMS models have been subjected to a large transformation, characterized by a rapid rise in both popularity and capabilities. The leadership of this development is the private LLMS such as GPT-4 and GPT-O1, which has acquired widespread attention in the artificial intelligence community because of its wonderful performance and diversity. At the same time, open source LLMS, such as Llama, has made great contributions to increased popularity from LLMS due to the ease of customization and publication of models through various applications. Although open source LLMS offers unprecedented opportunities for innovation and research, LLMS marketing raised concerns about transparency, reproduction and safety. Many open source Llms fail to meet the basic transparency requirements by blocking basic ingredients such as training and data code, which may hinder more innovations on LLMS. To mitigate this problem, we offer Moxin 7B, which is completely developed LLM, and is committed to the principles of open science, open source, open data, and open access. We export the pre -training and training code, training data groups, points control, and intermediate and final inspection points, with the aim of providing continuous obligations to the fully open source LLMS. After preparing the basic model in advance, we secure the McCain base model with the SOTA framework after training and education to obtain the Moxin Tool model. To improve the possibility of thinking, we do more typical nutrition with the DEPSEK R1 series, then we use the group’s relative policy (GRPO) after Deepseek R1 to achieve our model, which leads to the Melissine thinking model. Moreover, we develop our vision language model based on our Mixin model. Experiments show that our models achieve a superior performance in various assessments such as zero evaluation, a few snapshots, and Cot assessment.
The application date
From: Pu Zhao [view email]
[v1]
Sun, December 8, 2024 02:01:46 UTC (139 KB)
[v2]
Wed, December 11, 2024 19:03:58 UTC (142 KB)
[v3]
Thursday, 10 April 2025 19:05:16 UTC (146 KB)
[v4]
Wed, 23 April 2025 01:38:02 UTC (147 KB)
[v5]
Wednesday, 11 June 2025 17:10:59 UTC (59 KB)
Don’t miss more hot News like this! Click here to discover the latest in AI news!
2025-06-12 04:00:00