A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

PDF view of the paper entitled Nemotron-H: a family of mamba mamba models accurate and effective, from Nvidia: Aaron Blakeman and 198 other authors
PDF HTML (experimental) view
a summary:Since the limitation of the time of reasoning becomes very important to enhanced thinking capabilities, it has become increasingly important to build effective models of conclusion. We offer the Nemotron-H, a family of 8b and 56B/47b hybrid transmission models designed to reduce the cost of reasoning for a specific level of accuracy. To achieve this goal, we replace most of the self -enrollment layers in the structure of the popular transformer model with MAMBA layers that perform a fixed account and require a fixed memory for each unique symbol. We explain that Nemotron-H Models offer a better accuracy or on the basis of models compared to open-sized-size transformer models (for example, QWEN-2.5-7B/72B and Llama-3.1-8B/70B), while it reaches $ 3 \ \ once faster to infer. To increase the speed of reasoning and reduce the required memory at the time of reasoning, we created the Nemotron-H-47B base from the 56B model using a new pressure by trimming and distillation called Minipuzzle. Nemotron-H-47B-BAS achieves a similar accuracy of the 56B model, but it is 20 % faster for conclusion. In addition, we offer a FP8 training recipe and show that it can achieve PAR results with BF16 -based training. This recipe is used to train 56B. We release the Nemotron-H model with support in Huging Face and NEMO.
The application date
From: Deepak Narhayanan [view email]
[v1]
Fri, 4 April 2025 17:41:58 UTC (716 KB)
[v2]
Thursday, 10 April 2025 05:31:53 UTC (721 KB)
[v3]
Tuesday, 15 April 2025 14:36:01 UTC (716 KB)
Don’t miss more hot News like this! AI/" target="_blank" rel="noopener">Click here to discover the latest in AI news!
2025-04-16 04:00:00