AI

[2505.00949] Llama-Nemotron: Efficient Reasoning Models

Authors:Brother Berkovic, Etay Levy, Ezik Golan, Mohamed Dababa, Ran Yanif, Awmi Bonnie, Edol Galile, Zach Moshe, Tomer Ronin, Naguib Nabwani, Adawi Shahf, Orine Trap, Ehuda Carbas, Ran Zilberns, Jay. Bukharin, Yian Zhang, Tugrul Konuk, Gerald Shan, Ameya Sun Mahabaleshwarkar, Bilal Kartal, Yoshi Suhara, Olivier Delaleau, Zijia Chen, Zhilin Wang, David Mosallanezhad, Adi Renduchintala, Haifeng Qan, Vahid Noroozi, Wasi Uddin Ahmad, Sean Naarethiran, Aleksander Ficek, Mehrzad Samadi, Jocelyn Huang, Siddhartha Jain, Igor Gitman, Ivan Moskov, Wei du, Shubham Toshniw, George Armstrong, Evelina Bakhturina, Prasoon Varshney, Makeh Narsimhan, Jane Polak Scowcroft, John Kamalu, Dan Su, Kezhi Kong, Markus Kliegl, Rabeeh Karimi, Ying Lin, Sanjeev Satheesh, Parmen Parmnder, Pritam Gundecha, Nahida Akta, Mustova Batwari, Abhinaf Kattar, Depac Narianan, Roger and Lev, Jimmy Chang, Purg Soo, Guyuu Huang, Terry Kong, Barth Chada, Sahil Jain, Christine Harvey, Edad Seagal, Jinji Huang, Sergey Kashirs Lam, Aaron Venkacan, Sherry Wu, Vennah Ngwin, Manuj Kelaro, Andrew Wang, Anna Warreno, Abelish Somasamemarm, Sandeb Bhaskar, Maca Dong, Marco Rovinelli, Figi Palace, Nicholas Edelman

And others. (36 additional books did not appear)

View the PDF file from the paper entitled Llama-Nemotron: effective thinking models, by Akhiad Bercovich and 135 other books

PDF HTML (experimental) view

a summary:We offer the Llama-Nemotron series, an open family of heterogeneous thinking models that offer exceptional capabilities of thinking, inference efficiency, and an open license to use institutions. The family comes in three sizes-nano (8b), Super (49b), and ULTRA (253B)-and it performs competitive with modern thinking models such as Deepseek-R1 while providing distinct productivity and efficiency of memory. In this report, we discuss the training procedure for these models, which require the use of research in nervous architecture from Llama 3 forms for accelerated reasoning, distillation of knowledge, and the continuation of training, followed by the post -training that focuses on logic, which consists of two main parts: the extensive supervision and learning installation. Llama-Nemotron models are the first open source models to support dynamic thinking, allowing users to switch between chat modes and standard thinking during reasoning. For more open search support and facilitate the development of models, we offer the following resources: 1. We launch the Llama-NEMOTRON-LN-NANO, LN-SUPER and LN-Ultra-under the NVIDIA Open Licensing Agreement Commercially. 2. Dataset Llama-Nemotron-Post. 3. We also publish in our training codes: NEMO, Nemo-Aligner and Megatron-LM.

The application date

From: Jiaqi Zeng [view email]
[v1]

Friday, 2 May 2025 01:35:35 UTC (2,263 KB)
[v2]

Monday, 5 May 2025 21:03:44 UTC (2,263 KB)
[v3]

Wed, 14 May 2025 16:47:23 UTC (2,263 KB)
[v4]

Monday, 30 June 2025 20:37:51 UTC (712 KB)

Don’t miss more hot News like this! AI/" target="_blank" rel="noopener">Click here to discover the latest in AI news!

2025-07-02 04:00:00

Related Articles

Back to top button