AI

Nvidia Blackwell Leads AI Inference, AMD Challenges

In the latest round of MLCOCOMONS, computers that were built around the new GPU structure in NVIDIA over all others. But the latest AMD circulation on instinctive graphics processing units, MI325, has proven to match NVIDIA H200, the product that was supposed to deal with. The similar results were mostly on the tests of the large Llama2 70B Language Language Models (compared to 70 billion teachers). However, trying to keep up with the scene of rapidly changing artificial intelligence, Mlperf added three new criteria to better reflect the place of machine learning.

Mlperf runs the measurement of automated learning systems in an attempt to provide a comparison of computer systems. Applicants use their programs and devices, but basic nerve networks should be the same. There are a total of 11 standards for servers now, with the addition of three this year.

Miro Hodak, co -chair of MLPERF inference. ChatGPT appeared only in late 2022, Openai revealed the first LLM that could think through the tasks last September, and LLMS has grown significantly – GPT3 had 175 billion teachers, while GPT4 is believed to have approximately 2 trillion. As a result of innovation Breakneck, “W“The pace of obtaining new criteria has increased in this field,” says Hodak.

The new standards include two llms. The famous and relatively famous Llama2-70B is an indication of Mlperf, but the consortium wanted something that mimics the response that people expect from Chatbots today. So the new standard “Llama2-70B Inactive” tightens the requirements. Computers should produce at least 25 icons per second under any circumstances and cannot take more than 450 milliseconds to start an answer.

The “Agentic AI” – Networks that can cause through complex tasks – MLPERF sought a LLM test that will have some of the necessary features for that. Choose Llama3.1 405b for the job. LLM has a so -called wide context window. This is a measure of the amount of information – arguments, samples of software instructions, etc. – can be taken simultaneously. For Llama3.1 405B this is 128,000 symbols, i.e. more than 30 times like Llama2 70B.

The new final standard, called RGAT, is the so -called graphic attention network. It works to classify information in the network. For example, the data group used to test RGAT consists of scientific papers, which all have relationships between authors, institutions and fields of studies, which make up 2 TB of data. RGAT should be classified for less than 3000 topics.

Blackwell, instinct results

Nafidia She continued to dominate MLPERF criteria through her application processes and those that include about 15 partners such as Dell, Google and SuperMicro. Each of the first and second generation of the Huber Architects GPUS-H100 and Herned Hend-Hendectation-Aammy-SHENDERS-HONDSERS. “We have been able to get a 60 percent performance during the past year,” says Dave Salvator, who entered production in 2022. “He still has some main space in terms of performance.”

But GPU was in Nvidia’s Blackwell Architecture, B200, who really dominated. “The only thing the fastest thing from Hopper is Blackwell,” said Salvator. B200 packages in the 36 percent high -frequency domain memory, but more importantly, it can lead the educated mathematics to the machine using numbers with accuracy of 4 bits instead of 8 bits. Low accuracy calculation units are smaller, so more convenient on the graphics processing unit, which leads to a faster AI computing.

In standard llama3.1 405b, the eight B200 system of SuperMicro presented nearly four times the distinctive symbols per second of the eight H200 by CISCO. The same SuperMicro system was three times as soon as the H200 computer in the interactive version of Llama2-70B.

NVIDIA has used its mix of Blackwele and Grace CPU graphics units, which are called GB200, to show the quality of NVL72 data links that can merge multiple servers into the pregnant woman, so they lead as if the giant graphics processing unit. In an unintended result, the company shares with correspondents, provides a full shelf of GB200 869200 icons/s icon/s on Llama2 70B. The fastest reported system in this round of Mlperf was the NVIDIA B200 server that delivered 98,443 symbols/s.

AMDYou put the latest GPU instinct, Mi325X, as a competitive performance for NVIDIA H200. The MI325X has the same architectural engineering that its predecessor has MI300, but it adds more high-frequency domain memory, a 288 GB memory domain width and 6 TB (50 percent and 13 percent, respectively).

Add more memory is a play to deal with LLMS larger and larger. ““The largest models are able to take advantage of these graphics processing units because the model can be suitable for a single graphics processing unit or one server. When T.Take out those connections that improve your cumin slightly. “AMD was able to benefit from the additional memory by improving programs to enhance the speed of inferences for Deepseek-R1 8-Pold.

In the Llama2-70B test, the MI325X computers came in 3 to 7 percent of the similarly deceived H200 system. On the generation of photos, the MI325X system was in the range of 10 percent of the NVIDIA H200 computer.

The other AMD brand was noticed by this round of its partner, Mangoboost, which showed a performance nearly four times in the Llama2-70B test by conducting the account via four computers.

Intel The CPU systems have historically developed only in the inference competition to show that for some work burdens, you do not really need the graphics processing unit. This time I witnessed the first data of Intel’s Xeon 6 chips, which was previously known as Granite Rapids and is made using the Intel-Manomere process. At 40,285 samples per second, the best images recognition results for a dual Xeon 6 computer were about a third of the CISCO computer with two NVIDIA H100s.

Compared to the results of Xeon 5 of October 2024, the new CPU provides about 80 percent for this standard and a greater increase in detection of organisms and medical imaging. Since it started to present the results of Xeon in 2021 (Xeon 3), the company has achieved a 11 -time performance in Resnet.

Currently, it appears that Intel has left the field in the Battle of Chip Chip Acerator AI. Her alternative to NVIDIA H100, Gaudi 3, not appeared in the results of the new Mlperf or in version 4.1, which was released last October. Gaudi 3 got a late version of the plan because his program was not ready. In the opening notes at Intel Vision 2025, the Customer Conference of the company only, the newly chief executive lips seemed to apologize for the Intel Ai efforts. “I am not happy with our current position,” he told the attendees. “You are not happy too. I hear you loudly and clearly. We are working on a competitive system. This will not happen overnight, but we will get there for you.”

GoogleTPU V6E Chip also made an offer, although the results were only bound by the task of generating images. At 5.48 Information per second, 4 TPU system seen a 2.5X payment on a similar computer using its TPU V5E predecessor in October 2024 results. However, 5.48 per second inquiries were almost in line with the Lenovo computer with a similar size using NVIDIA H100s.

From your site articles

Related articles about the web

2025-04-02 15:00:00

Related Articles

Back to top button