Alibaba launches open source Qwen3 besting OpenAI o1

0 5 minutes read

Join daily and weekly newsletters to obtain the latest updates and exclusive content to cover the leading artificial intelligence in the industry. Learn more

The QWEN team for Chinese e -commerce and web specifications launched a new series of multimedia open source sources known as QWEN3, which looks among the latest in open models, and performing performance for royal models such as Openai and Google.

The QWEN3 series features two models of “Mix of Experience” and six dense models for a total of eight new models (!). The “External Experts” approach includes the presence of many different types of specialized models in one, while activating these relevant models only for the task that is activated when needed in the internal settings of the model (known as parameters). It is circulated by Open Source French Ai Startup Mistral.

According to the team, the teacher of 235 billion people from QWEN3 Codenamed A22B exceeds the open performance of Deepseek and Openai’s OPRITARY O1 on the main third -party standards including Arenahand (with 500 user questions in software and mathematics engineering) and its proximity to Google Gomini performance New Godi-Proinis.

In general, the standard data QWEN3-23B-A22B places as one of the most powerful models available to the public, achieving parity or excellence for major industry shows.

Hybrid theory (thinking)

QWEN3 models are trained to provide the so -called “hybrid thinking” or “dynamic thinking”, allowing users to switch between fast and accurate responses and more steps consumed by time and intense (similar to the “O” series in Openai) for the most difficult to inquire in science, sports, and other engineering. This is a pioneering approach by Nous Research and other startups and research gatherings of artificial intelligence.

With QWEN3, users can engage in the most intense “thinking mode” using the specified button on this way on QWEN Chat or by including specific claims such as /think or /no_think When publishing the model locally or through the application programming interface, allowing flexible use depending on the complexity of the task.

Users can now access and publish these models via platforms such as Hugging Face, Modelscope, Kagge and GitHub, as well as interact with them directly via the QWEN Chat and mobile applications. The version includes both a mixture of experts (MEE) and dense models, all of which are available under an open source APache 2.0 license.

In my short use of QWEN Chat yet, he was able to create relatively fast images and quickly coinciding with – especially when the text was already combined while matching the style. However, it often led me to log in and undergo usual Chinese content restrictions (such as banning claims or responses related to Tiananmen Square protests).

In addition to MEE offers, QWEN3 includes dense models on different standards: QWEN3-32B, QWEN3-14B, QWEN3-8B, QWEN3-4B, QWEN3-1.7B, and QWEN3-0.6B.

These models differ in size and architecture, as user options are offered to suit various needs and mathematical budgets.

QWEN3 models are greatly expanding multi -language support, which now covers 119 languages and tone across the major language families. This expands the possible applications of models globally, facilitating research and publication in a wide range of linguistic contexts.

Training model and architecture

In terms of typical training, QWen3 represents a big step from its predecessor, QWEN2.5. The volume of the premenstrual data set doubles to about 36 trillion symbol.

Data sources include web crawls, pdf -like documents, and artificial content created using previous QWEN models that focus on mathematics and coding.

The training pipeline consists of a three -stage training process, followed by a four -stage post -training revision to enable hybrid capabilities and not thinking. Training improvements to the dense basic models of QWEN3 allow identical or exceeding the performance of QWEN2.5 models much larger.

Multi -use publishing options. Users can integrate QWEN3 models using frameworks like Sglang and VLM, both of which provide end -up -toei.

For local use, it is recommended by options such as ollama, LMSTudio, MLX, Llama.cpp and KtransFormers. In addition, users interested in the agent’s capabilities of models are encouraged to explore the QWEN-Agent Tools group, which simplifies tools.

Junyang Lin, a member of the QWEN team, commented on X that the construction of QWEN3 includes facing critical technical challenges but the least glamorous such as scaling reinforcement, multi -field data budget, expansion of multi -language performance without high -quality sacrifice.

Lynn also pointed out that the team transmits the focus towards training agents capable of long thinking on the tasks in the real world.

What does this mean for the decision makers of the institution?

Engineering teams can direct the end -compatible Openai -compatible points to the new model in hours instead of weeks. MEE (235 B Passengers with Active 22 B, 30 B with 3 B Activist) offers GPT-4 category in approximately GPU memory cost for a thick model 20-30 B.

The official Lora and Qlora hook allows the private pressure without sending ownership data to a third party seller.

Make dense variables from 0.6 B to 32 B made it easy on the initial model on laptops and expanding their range to multiple GPU groups without rewriting claims.

The operation of the internal weights means that all claims and outputs can be recorded and searched. Moe Sparsity reduces the number of active parameters for each call, cutting off the surface of the inference attack.

The APache-2.0 license removes the legal obstacles based on use, although organizations still have to review the effects of export and governance to use a model trained by a China-based seller.

However, at the same time, it also provides an applicable alternative to other Chinese players including Deepseek, Tencent, BYTEDANCE – as well as countless number and increasing number of North American models such as Openai, Google, Microsoft, Michrofc, Amazon, Meta and others. The permissible APache 2.0 license – which allows unlimited commercial use – is a great advantage over other open source players such as Meta, whose licenses are more restricted.

Moreover, the race between artificial intelligence providers to provide powerful and accessible models are still very competitive, and that smart organizations looking to reduce costs must remain flexible and open to assess the new models of artificial intelligence agents and work progress.

We look forward

QWEN team plays QWEN3 not only as a gradual improvement but as an important step towards future goals in artificial general intelligence (AGI) and artificial experts (ASI), artificial intelligence is much more intelligent than humans.

The following QWEN plans include scaling data and model size more, expanding context lengths, expanding method support, enhancing reinforcement learning with environmental feedback mechanisms.

With the continued development of the natural scene of artificial intelligence research on a large scale, the QWEN3 version of the weight under an accessible license represents another important milestone, which reduces the barriers allocated to researchers, developers and organizations that aim to innovate with LLMS on the latest model.

Daily visions about business use cases with VB daily

If you want to persuade your boss at work, you have covered VB Daily. We give you the internal journalistic precedence over what companies do with obstetric artificial intelligence, from organizational transformations to practical publishing operations, so that you can share visions of the maximum return on investment.

Read our privacy policy

Thanks for subscribing. Check more VB newsletters here.

An error occurred.