MoE Architecture Comparison: Qwen3 30B-A3B vs. GPT-OSS 20B

0 4 minutes read

This article provides a technical comparison between the recently released MOE: QWEN3 30B-A3B (released in April 2025) and Openai’s GPT-SS 20B (released in August 2025). Both models represent distinct curricula for Moe engineering design, and the balance of mathematical efficiency with performance through various publishing scenarios.

Model overview

feature	QWEN3 30B-A3B	GPT -SS 20B
Total teachers	30.5b	21b
Active parameters	3.3B	3.6B
Number of classes	48	24
Ministry of Water experts	128 (8 active)	32 (4 active)
Attention engineering	The attention of the collected inquiry	The multi -degree attention
Inquiry/major value heads	32Q / 4KV	64Q / 8kv
Window of context	32,768 (262,144)	128000
Vocabulary	151,936	O200K_Harmony (~ 200k)
Quantity	Standard accuracy	Native mxfp4
release date	April 2025	August 2025

Sources: Official Documents QWEN3, Openai GPT -SS documents

QWEN3 30B-A3B Technical Specifications

Architecture details

QWEN3 30B-A3B uses a deep transformer structure with 48 layersEach contains composition of a mixture of experts with 128 experts per class. Activates the model 8 experts for each symbol During reasoning, achieving a balance between specialization and mathematical efficiency.

Attention mechanism

Use the form GQA attention attention (GQA) with 32 inquiries and 4 main value heads³. This design improves memory use while maintaining attention quality, especially for long -context treatment.

Context and multi -language support

The length of the original context: 32768 symbols
Extended contextUp to 262,144 symbols (last variables)
Multi -language support: 119 languages and dialects
Vocabulary: 151,936 icon using BPE code

Unique features

QWEN3 merge a Hybrid thinking system Supporting both the conditions of “thinking” and “other than thinking”, allowing users to control the calculations based on the complexity of the task.

Technical specifications GPT -SS 20B

Architecture details

GPT-SS 20B Features 24 -layer transformer with 32 MEE experts per class⁸. Activates the model 4 experts for each symbolEmphasizing the ability of the wider experts to specialize.

Attention mechanism

The form is executed The multi -degree attention with 64 inquiries and 8 main value heads arranged in groups of 8⁰. This composition supports effective inference while maintaining attention quality through broader architecture.

Context and improvement

The length of the original context: 128,000 symbols
QuantityNative Mxfp4 (4.25 -bit) for MEE weights
Memory efficiency: It works on 16 GB memory with quantitative measurement
code: O200K_Harmony (SuperSet of GPT-4O Tokeenizer)

Performance properties

GPT -SS 20B is used Alternately, the dense and localized attention patterns Like GPT-3, with Distinguished localization (rope) For local coding.

Comparing architectural philosophy

Depth strategy against the show

QWEN3 30B-A3B Confirm The depth and diversity of experts:

48 layers that allow multiple stages and hierarchical abstraction
128 experts for each layer providing an accurate specialty
Suitable for the complex thinking tasks that require deep treatment

GPT -SS 20B Give priority Display and arithmetic density:

24 layers with larger experts increase the representative ability of each layer
Less experts, but more powerful (32 versus 128) than individual experts’ ability
It has been improved to conclude one effective pass

Strategies for the Ministry of Water

QWEN3: Directing the symbols through 8 of 128 expertsEncouraging various treatment paths sensitive to context and steadfast decisions.

GPT -SS: Directing the symbols through 4 of 32 expertsMaximizing the arithmetic energy for each experience and providing concentrated treatment for each reasoning step.

Memory and publishing considerations

QWEN3 30B-A3B

Memory requirements: A variable dependent on accuracy and the length of the context
Publishing: Improved to spread the cloud and the edge with an extension of a flexible context
Quantity: Supports different quantitative measurements after training

GPT -SS 20B

Memory requirements: 16 GB with the original MXFP4 amount, ~ 48 GB in BFLOAT16
Publishing: Designed for consumer compatibility
QuantityThe original MXFP4 training allows effective reasoning without the deterioration of quality

Performance properties

QWEN3 30B-A3B

It excels in Sports thinking, coding and complex logical tasks
Strong performance in Multi -language scenarios Through 119 languages
Thinking Provides the complicated problems of thinking

GPT -SS 20B

Achieve Similar performance for Openai O3-MINI On standard standards
The optimum for Using the tool, browsing the web, and connecting to the job
strong Chain of thinking With adjustable thinking voltage levels

Use the status recommendations

Choose QWEN3 30B-A3B for:

Complex thinking tasks that require multi -stage treatment
Multi -language applications through various languages
Scenarians require an extension of the length of the flexible context
Applications in which the transparency of thinking/thinking is estimated

Choose GPT-SS 20B for:

Publishing operations restricted to resources that require efficiency
Applications and applications of the agent
Rapid inference with consistent performance
Edge publishing scenarios with limited memory

conclusion

QWEN3 30B-A3B and GPT-SS 20B represent the supplementary approach to architecture design. QWEN3 emphasizes the depth, diversity of experts, and multi -language capacity, which makes it suitable for complex thinking applications. GPT-SS 20B gives efficiency priority, integration of tools, elasticity of publishing, and placing it in practical production environments with resource restrictions.

Both models show the development of the MEE structure beyond the scaling of the simple parameter, as it includes advanced design options that are compatible with architectural decisions with intended use cases and publishing scenarios.

Note: This article is inspired by post Reddit and the scheme shared by Sebastian Raschka.

sources

QWEN3 30B-A3B-embrace face
QWEN3 Technical Blog
QWEN3 30B-A3B Specifications
QWEN3 30B-A3B Guidance 2507
QWEN3 official documents
QWEN Tokeenizer Documents
QWEN3 features
Openai GPT -SS Introduction
GPT -SS GitHub warehouse
GPT -SS 20B-Groq Documentation
Openai GPT -SS technical details
GPT -SS embrace
Openai GPT -SS 20B
Openai GPT -SS Introduction
NVIDIA GPT -SS Artistic Blog
GPT -SS embrace
QWEN3 Performance Analysis
Openai GPT -SS
GPT-SS 20B capabilities

Michal Susttter is a data science specialist with a master’s degree in Data Science from the University of Badova. With a solid foundation in statistical analysis, automatic learning, and data engineering, Michal is superior to converting complex data groups into implementable visions.

Don’t miss more hot News like this! AI/" target="_blank" rel="noopener">Click here to discover the latest in AI news!

2025-08-07 05:02:00

0 4 minutes read