Alibaba’s new Qwen3-235B-A22B-2507 beats Kimi-2, Claude Opus

0 8 minutes read

Want more intelligent visions of your inbox? Subscribe to our weekly newsletters to get what is concerned only for institutions AI, data and security leaders. Subscribe now

Alibaba, the Chinese e -commerce giant, has made waves worldwide in technology and business societies with its family from the large AI language models “QWEN”, starting with the launch of the original Tongyi Qianwen Llm Chatbot in April 2023 through the QWen 3 version in April 2025.

Why?

Well, not only their models and a high degree in standard tests for an external party in completing the tasks of mathematics, science, thinking and writing, but for the largest part, they were issued in light of the conditions for licensing open sources permitted, allowing organizations and institutions to download, allocate, and use them in general in all types of land, and even commercial. Think about them as a substitute for Deepseek.

This week, the “QWEN team” from Alibaba, as is known as the Amnesty International Department, released the latest updates of the QWEN family, and they are already attracting attention again from Power AI users in the West for their highest performance, in one case, as they dismantled to the new Kimi-2 from the start of the Chinese Moonshot opponent in mid-July 2025.

AI Impact series returns to San Francisco – August 5

The next stage of artificial intelligence here – are you ready? Join the leaders from Block, GSK and SAP to take an exclusive look on how to restart independent agents from the Foundation’s workflow tasks-from decisions in an actual time to comprehensive automation.

Ensure your place now – the space is limited: https://bit.ly/3Guupf

The QWEN3-23B-A22B-257-Instruct-released on Code Code Communion AI AI is affected by the Point 8 or FP8 version, which we will cover more deep below-improves the original QWEN 3 about thinking tasks, realistic accuracy and multi-tarit understanding. It also exceeds the “other than thinking” version of Claude Obus 4.

The new QWEN3 update also provides better coding results, compatible with user preferences, and long -context processing, according to its creator. But this is not everything …

Read on what the Foundation users and technical decision makers provide.

The FP8 version of QWEN 3 operating institutions allows a much lower memory and a much lower calculation

In addition to the new QWEN3-235B-A22B-2507, QWEN has released the “FP8” version “FP8” 8 bit floating pointIt is a format that presses the numerical processes of the model to use a lower capacity of memory and processing – without significantly affecting its performance.

In practice, this means that institutions can run a model with QWEN3 capabilities on smaller and less expensive or more efficient devices in the cloud. The result is faster response times, low energy costs, and the ability to expand publishing without the need for a huge infrastructure.

This makes the FP8 model particularly attractive to production environments with narrow time limitations or cost restrictions. The teams can expand QWEN3 capabilities to a single -knot GPU or local development machines, and avoid the need for multiple GPU. It also reduces the barrier imposed on the publishing operations for light and local drawing, where the infrastructure resources are limited and the cost of ownership of ownership.

Although the QWEN team did not launch the official accounts, the comparisons with the similar FP8 quantitative publications indicate that efficiency savings are great. Here’s my work clarification (It was updated and corrected on 07/23/2025 at 16:04 pm Each time – This piece was originally included an inaccurate scheme based on a miscalculation, I apologize for mistakes He thanked the readers for the call I am about them.):

metric	BF16 / BF16-upeiv Building	FP8 construction quantity
Use GPU*	≈ 640 GB (8 x H100-80 GB, TP-8)	≈ 320 GB Total on 4 x H100-80 GB, TP-4 Less than the Pies Sumat: ~ 143 GB across 2 x H100 with ollama off-loading
Single reasoning speed †	~ 74 symbols / s (batch = 1, context = 2K, 8 x H20-96 GB, TP-8)	~ 72 icon / s (the same settings, 4 x H20-96 GB, TP-4)
Energy / Energy	The full knot of eight H100s is equivalent to ~ 4-4.5 kW under pregnancy (550-600 watts per card, in addition to the host) ‡	FP8 needs half of the cards and moves half of the data; FP8 status report from NVIDIA ≈ 35-40 % lower than TCO and energy in similar productivity
Required graphics processing units (practical)	8 x H100-80 GB (TP-8) or 8 x A100-80 GB for equivalent	4 x H100-80 GB (TP-4). 2 x H100 is possible with aggressive pregnancy, at the expense of cumin

*Disk fingerprint for checkpoints: BF16 weights are ~ 500 GB. The FP8 checkpoint is “more than 200 GB”, so providing absolute memory on GPU mostly comes from the need for fewer cards, not from weights alone.

† Speed numbers are from the official Sglang standards QWEN3 (Batch 1). Positan productivity measures with a batch size: BASENEN ~ 45 icons/s for per user in the 32nd and ~ 1.4 kg/s tully on the same four GPU FP8 sets.

‡ No seller does not provide accurate numbers of QWEN wall strength, so we are close to using H100 specifications and NVIDIA Hopper FP8 data.

No more “hybrid thinking” … instead, QWEN will make separate models and guidance!

Perhaps the most interesting ever, the QWEN team announced that it will not follow a “hybrid” thinking approach, which he presented again with QWen 3 in April and it appears to be inspired by a pioneering approach through sovereign collective research.

This allowed users to switch on the “thinking” model, allowing the artificial intelligence model to engage in his self -examination and produce “idea chains” before responding.

In some way, it is designed to imitate the capabilities of strong royal models such as the “O” OPENAI series (O1, O3, O4-MINI, O4-MINI-HIGH), which also produces “idea chains”.

However, unlike those competing models that are always involved in such “thinking” for each router, QWEN 3 mode can be handcrafted or turned off by the user by clicking the “Thinking mode” button on QWEN on the QWEN website, or by writing “/thought” before a local or special claim to trade the form.

The idea was to give users to control the involvement of the slow thinking mode and the dense density of the most difficult demands and tasks, and to use a non -thinking position for simpler claims. But again, put this responsibility on the user for a report. Although it is flexible, it has also provided the complexity of unpopular design and behavior in some cases.

Now the QWEN team wrote in its advertising publication on X:

“After talking to society and thinking about it, we decided to stop using a hybrid thinking mode. Instead, we will train the instructions forms and think separately so that we can get the best possible quality.”

By updating 2507-tool or non-seasonal model, at the present time-alibaba no longer extends to both rituals in one model. Instead, separate typical variables will be trained in education and thinking tasks in a row.

The result is the model that is closely related to the user’s instructions, generates more predictable responses, and as standard data appears, it greatly improves through the multiple areas of evaluation.

Performance standards and cases of use

Compared to its predecessor, the QWEN3-23B-A2B-Instruct-2507 model offers measurable improvements:

MMLU-PRO rises from 75.2 to 83.0A noticeable gain in performing general knowledge.
GPQA and Supergpqa standards improve by 15-20 percentage pointsWhich reflects stronger realistic accuracy.
Thinking tasks Such as Aime25 and Arc-AGI show more than twice the previous performance.
The generation of the code improvesWith the rise in LiveCodebench from 32.9 to 51.8.
Multi -language support expandsWith the help of improving coverage of long languages and alignment with the best accents.

The model maintains the structure of a mixture of experts (MEE), and activates 8 out of 128 experts during reasoning, with a total of 235 billion teachers –22 billion of which are active at any time.

As we mentioned earlier, the FP8 version offers an accurate amount for a better conclusion and reduce memory use.

Ready -to -design institution

Unlike many open source Llms, which are often issued under research licenses only or requires API access for commercial use, QWEN3 directly aims to publish institutions.

He boasts of tolerance Apache 2.0 licenseThis means that companies can use them freely for commercial applications. It may also follow:

Publishing models locally or through the application facades compatible with Openai using VLM and Sglang
Setting models in particular using Lora or Qlora without exposing ownership data
Record and check all local claims and outputs to comply and scrutinize
A scale from the initial model to production using dense variables (from 0.6B to 32B) or check points

Alibaba team also presented QWEN-AgentA lightweight framework that has logical protest tools for users to build agent systems.

Standards such as Tau-RECAIL and BFCL-V3 indicate that the education model can carry out multi-step decision-making tasks efficiently-and in particular the field of agents designed for this purpose.

Community and industry reactions

The version has already been received well by Power AI users.

Paul KovirAi teacher and founder of Private Llm Chatbot, Blue Shell AI, a comparison scheme explaining QWEN3-235b-A2B-Instruct-25 “Stronger than Kimi K2 … and even better than Claude Obus 4.”

Amnesty International influential Nik (@ns123abc)Commentary on its rapid effect: “You are laughing.

During, Jeff BodyHuging Face, the most prominent publishing advantages: “QWEN released a huge improvement for QWEN3 … it tops the best opening (KIMI K2, Large 4x) and closed (Claude Obus 4) LLMS on the standards.”

He praised the availability of FP8 checkpoint to infer faster, publish with one click on Azure ML, and support local use via MLX on Mac or INT4 Builds from Intel.

The total tone of developers was enthusiastic, as the balance of performance, licensing, and publication calls for both amateurs and professionals.

What is the following for the QWEN team?

Alibaba is already setting the foundation for future updates. There is a separate model that focuses on thinking about the pipeline, and the QWEN road map refers to increased agent systems capable of planning long tasks.

It is also expected that multimedia support, which appears in QWEN2.5-UMNI and QWEN-VL models.

Indeed, rumors and grumbling began as QWEN members are annoying another update to their model family, with updates on their web properties revealing the URL chains of the QWEN3-Coder-480b-A3B-Instrect Modent, most likely 480 billion of the parameter mixture.

What is QWEN3-23B-A22B-Instruct-2507 ultimately is not just another leap in standard performance, but the ripening of open models as viable alternatives to property systems.

The flexibility of publishing, strong public performance and friendly licensing gives the model a unique advantage in a crowded field.

For the teams looking to integrate the advanced instruction models in the Mix of Artificial Intelligence-without the seller lock restrictions or use-based fees-QWEN3 is a dangerous competitor.

Daily visions about business use cases with VB daily

If you want to persuade your boss at work, you have covered VB Daily. We give you the internal journalistic precedence over what companies do with obstetric artificial intelligence, from organizational transformations to practical publishing operations, so that you can share visions of the maximum return on investment.

Read our privacy policy

Thanks for subscribing. Check more VB newsletters here.

An error occurred.