[2505.00949] Llama-Nemotron: Efficient Reasoning Models

View the PDF file from the paper entitled Llama-Nemotron: effective thinking models, by Akhiad Bercovich and 135 other books
PDF HTML (experimental) view
a summary:We offer the Llama-Nemotron series, an open family of heterogeneous thinking models that offer exceptional capabilities of thinking, inference efficiency, and an open license to use institutions. The family comes in three sizes-nano (8b), Super (49b), and ULTRA (253B)-and it performs competitive with modern thinking models such as Deepseek-R1 while providing distinct productivity and efficiency of memory. In this report, we discuss the training procedure for these models, which require the use of research in nervous architecture from Llama 3 forms for accelerated reasoning, distillation of knowledge, and the continuation of training, followed by the post -training that focuses on logic, which consists of two main parts: the extensive supervision and learning installation. Llama-Nemotron models are the first open source models to support dynamic thinking, allowing users to switch between chat modes and standard thinking during reasoning. For more open search support and facilitate the development of models, we offer the following resources: 1. We launch the Llama-NEMOTRON-LN-NANO, LN-SUPER and LN-Ultra-under the NVIDIA Open Licensing Agreement Commercially. 2. Dataset Llama-Nemotron-Post. 3. We also publish in our training codes: NEMO, Nemo-Aligner and Megatron-LM.
The application date
From: Jiaqi Zeng [view email]
[v1]
Friday, 2 May 2025 01:35:35 UTC (2,263 KB)
[v2]
Monday, 5 May 2025 21:03:44 UTC (2,263 KB)
[v3]
Wed, 14 May 2025 16:47:23 UTC (2,263 KB)
[v4]
Monday, 30 June 2025 20:37:51 UTC (712 KB)
Don’t miss more hot News like this! AI/" target="_blank" rel="noopener">Click here to discover the latest in AI news!
2025-07-02 04:00:00