AI

A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

Authors:Nafidia: Aaron Blakeman, Arte Bastant, Apineaf Khattar, Edithia Rendcheltala, Akhdad Berkovic, Alexander Fisk, Alexis Purelin, Ali Tabsibkhechi, Amala Sanjay, Ashwain Bogari, Iuch Datoba, Palram Podharaju, Bobby Chen, Boris Ginsburg, Boxin Wang, Brandon Noric, Brian Boufteld, Brian Catanzaro, Carlo del Mondo, Chengu Dyong, Christine Harvey, Christophy Paris, Dano Sut Mosallanezhad, Deepak Narayanan, Denys Fridman, Dima Rekesh, Ding MA, DMYTRO PYKHTAR, Dong Ahn, Duncan Riaach, Dusan Stosic, Eileen Long, Elad Segal, Ellie Evans, Eric Chung, Erick Galinkin, Evelina Bakhturina, Fuxiao Liu, Gargi Prasad, Gerald Shan, Gulllin Liu, Guo Chen, Haifeng Qian, Hello Ngo, Hongbin Liu, Hui Li, Igor Gitman, Ilia Karmanov, Ivan Moshkov, Izik Golan, Jan Kautuz, Jan Scowcroft, Jason Seyell, Jiaki Zeng, Jiakuan Ant, Jimmy Zhang, Jing Chang, Jenning Huang, Jeans Shui, Jocelin Huang, Joy Conway, John Kamalo, John Parker, Carichan, Carichan, Carichan, Carrick. Keshav Santhanam, Kezhi Kong, Kirthi Sivamani, KrZysztof Pawelec, Kumar Anik, Kunlun Li, Lawrence MCAfee, Leon Derczynski, Lindsey Pavao, Luis Vega, Lukas Voegtle Sardahar, Marrin Choshovsky, Marcus Clay begel

And others. (100 additional authors did not appear)

PDF view of the paper entitled Nemotron-H: a family of mamba mamba models accurate and effective, from Nvidia: Aaron Blakeman and 198 other authors

PDF HTML (experimental) view

a summary:Since the limitation of the time of reasoning becomes very important to enhanced thinking capabilities, it has become increasingly important to build effective models of conclusion. We offer the Nemotron-H, a family of 8b and 56B/47b hybrid transmission models designed to reduce the cost of reasoning for a specific level of accuracy. To achieve this goal, we replace most of the self -enrollment layers in the structure of the popular transformer model with MAMBA layers that perform a fixed account and require a fixed memory for each unique symbol. We explain that Nemotron-H Models offer a better accuracy or on the basis of models compared to open-sized-size transformer models (for example, QWEN-2.5-7B/72B and Llama-3.1-8B/70B), while it reaches $ 3 \ \ once faster to infer. To increase the speed of reasoning and reduce the required memory at the time of reasoning, we created the Nemotron-H-47B base from the 56B model using a new pressure by trimming and distillation called Minipuzzle. Nemotron-H-47B-BAS achieves a similar accuracy of the 56B model, but it is 20 % faster for conclusion. In addition, we offer a FP8 training recipe and show that it can achieve PAR results with BF16 -based training. This recipe is used to train 56B. We release the Nemotron-H model with support in Huging Face and NEMO.

The application date

From: Deepak Narhayanan [view email]
[v1]

Fri, 4 April 2025 17:41:58 UTC (716 KB)
[v2]

Thursday, 10 April 2025 05:31:53 UTC (721 KB)
[v3]

Tuesday, 15 April 2025 14:36:01 UTC (716 KB)

Don’t miss more hot News like this! AI/" target="_blank" rel="noopener">Click here to discover the latest in AI news!

2025-04-16 04:00:00

Related Articles

Back to top button