Nvidia’s Blackwell Ultra Dominates MLPerf Inference

0 4 minutes read

The field of automatic learning moves quickly, and the scale used is to measure the progress of it for the race to keep up with it. An example of this is MLPERF, sometimes the annual bilateral learning competition, “Artificial Intelligence Olympics”, has provided three new new tests, reflecting new trends in this field.

“Recently, it has been very difficult to try to follow what was happening in this field,” says Melperf’s co -co -worker. “We see that the models are gradually larger, and in the last two rounds, we have presented the biggest models we have ever.”

The chips that dealt with these new standards came from the usual suspects – Nafidia, the arm, and Intel. NVIDIA topped the plans, as the new GPU Blackwell Ultra, which was filled in the design of the GB300 shelf. AMD performed a strong performance, with the latest MI325X graphics processing units. Intel has proven that one can still infer the central processing units through their Xeon’s introductions, but it also entered GPU with the introduction of Intel Arc Pro.

New standards

In the last round, Mlperf has presented its largest standard so far, a large language model based on Llama33.1-403B. On this tour, they topped themselves again, as they provided a Deepseek R1 671B – more than 1.5 times the number of parameters of the largest previous standard.

As a thinking model, Deepseek R1 goes through several steps of a series of ideas when approaching the query. This means that a lot of account occurs during the inference and then in the regular LLM process, which makes this standard more challenging. Thinking models are the most accurate, making it the technique of choosing science, mathematics and complex programming inquiries.

In addition to the largest LLM standard so far, MLPERF also introduced the smaller, based on Llama33.1-8B. Tran Ajyar, head of the Labor Band in MLPERF, explained. The small LLMS can provide this, which is an excellent choice for tasks such as summarizing text and edge applications.

This brings the total number of LLM standards to four squares. It includes the new llama3.1-8B standard. Llama2-70B standard pre-exists; Introduction to the last round of llama3.1-403B measurement; And the largest, the new Deepseek R1 model. If there is nothing else, these LLMS signals do not go anywhere.

In addition to LLMS MyRIAD, this round of MLPERF reasoning included a new sound model to a text, based on Whisper-Large-V3. This criterion is a response to the increasing number of applications that support sound, whether it is smart devices or the favorable artificial intelligence facades.

The ThemLperf competition has two broad categories: “closed”, which requires the use of the reference nerve network model as it is without adjustments, and “open”, as some adjustments are allowed to the form. Within these, there are many sub -categories related to how to conduct tests and in any kind of infrastructure. We will focus on the results of the “closed” data server for reason.

Nafidia performs

Unprecedented, no one, the best performance for every accelerator in each standard, at least in the “server” category, was achieved by a GPU NVIDIA system. NVIDIA also unveiled Blackwell Ultra, as the plans are at the top of the standards: LLAMA3.1-405B and Deepseek R1.

Blackwell Ultra is a more powerful Blackwell engineering, and it is much more memory, double -rating of attention layers, 1.5X more intelligence, memory and fastest communication compared to standard black. It aims to the burdens of larger artificial intelligence, such as the criteria that have been tested.

In addition to devices improvements, the Director of accelerated computing products at NVIDIA D whoVATOR Blackwell Ultra will be attributed to two main changes. First, use NVIDIA ProPRIETAY 4-Bit Floating Number Format, NVFP4. “We can provide similar accuracy for formats like BF16,” said Salvatore.

The second is what is called the dismantled service. The idea is behind the uniform presentation in the presence of two main parts of the inference work: Premill, where the query is downloaded (“Please summarize this report.”) The window of the context is fully loaded (the report) in LLM, and the generation/decoding, where the output is already calculated. These two stages have different requirements. While previously calculated heavy, the generation/decoding depends more on the width of the frequency of the memory. Salvatore says that by setting different groups of graphics processing units to the two different phases, NVIDIA has won a performance of about 50 percent, “Salvator says.

Amd is near

The latest accelerator of AMD, Mi355X launched in July. The company presented only results in the “open” category, as it is allowed to implement software modifications for the model. Like Blackwell Ultra, MI355X features 4 -bit floating support, as well as expanded high -frequency domain memory. Mahaish Palaceopramanian, Senior Director of GPU Products Marketing at the GPU at AMD, says that Mi355X overcame his predecessor, Mi325X, in the Llama2.1-70B standard with a 2.7 factor.

AMD “closed” lines included systems supported by AMD MI300X and Mi325X graphics processing units. The most advanced MI325X computer is presented similar to those created with NVIDIA H200s on Lllama2-70B, expert testing mixture, and photo generating standards.

This tour also included the first hybrid presentation, as both AMD MI300X and Mi325X graphics units were used for the same reasoning task, Llama2-70B standards. The use of hybrid graphics processing units is important, because new graphics processing units come in an annual rhythm, and the oldest models, which have been published in the gas, are not going anywhere. The ability to spread work burdens among different types of graphics processing units is an essential step.

Intel enters GPU

In the past, Intel has been steadfast because one does not need a graphics processing unit to do machine learning. In fact, the introductions using the Intel CPU on the NVIDIA L4 are still on the standard of detection of the object but late on the standard of the recommendation system.

This tour, for the first time, the Intel graphics processor also made an offer. Intel Arc Pro was first released in 2022. The MLPERF offers a graphics card called Maxsun Arc Pro B60 Dual 48g Turbo, which contains graphics processing units and 48GB of memory. The system was performed on the basis with L40S NVIDIA on the small LLM standard and its backwardness on the Llama2-70B standard.

From your site articles