AI

MoE Architecture Comparison: Qwen3 30B-A3B vs. GPT-OSS 20B

This article provides a technical comparison between the recently released MOE: QWEN3 30B-A3B (released in April 2025) and Openai’s GPT-SS 20B (released in August 2025). Both models represent distinct curricula for Moe engineering design, and the balance of mathematical efficiency with performance through various publishing scenarios.

Model overview

feature QWEN3 30B-A3B GPT -SS 20B
Total teachers 30.5b 21b
Active parameters 3.3B 3.6B
Number of classes 48 24
Ministry of Water experts 128 (8 active) 32 (4 active)
Attention engineering The attention of the collected inquiry The multi -degree attention
Inquiry/major value heads 32Q / 4KV 64Q / 8kv
Window of context 32,768 (262,144) 128000
Vocabulary 151,936 O200K_Harmony (~ 200k)
Quantity Standard accuracy Native mxfp4
release date April 2025 August 2025

Sources: Official Documents QWEN3, Openai GPT -SS documents

QWEN3 30B-A3B Technical Specifications

Architecture details

QWEN3 30B-A3B uses a deep transformer structure with 48 layersEach contains composition of a mixture of experts with 128 experts per class. Activates the model 8 experts for each symbol During reasoning, achieving a balance between specialization and mathematical efficiency.

Attention mechanism

Use the form GQA attention attention (GQA) with 32 inquiries and 4 main value heads³. This design improves memory use while maintaining attention quality, especially for long -context treatment.

Context and multi -language support

  • The length of the original context: 32768 symbols
  • Extended contextUp to 262,144 symbols (last variables)
  • Multi -language support: 119 languages and dialects
  • Vocabulary: 151,936 icon using BPE code

Unique features

QWEN3 merge a Hybrid thinking system Supporting both the conditions of “thinking” and “other than thinking”, allowing users to control the calculations based on the complexity of the task.

Technical specifications GPT -SS 20B

Architecture details

GPT-SS 20B Features 24 -layer transformer with 32 MEE experts per class⁸. Activates the model 4 experts for each symbolEmphasizing the ability of the wider experts to specialize.

Attention mechanism

The form is executed The multi -degree attention with 64 inquiries and 8 main value heads arranged in groups of 8⁰. This composition supports effective inference while maintaining attention quality through broader architecture.

Context and improvement

  • The length of the original context: 128,000 symbols
  • QuantityNative Mxfp4 (4.25 -bit) for MEE weights
  • Memory efficiency: It works on 16 GB memory with quantitative measurement
  • code: O200K_Harmony (SuperSet of GPT-4O Tokeenizer)

Performance properties

GPT -SS 20B is used Alternately, the dense and localized attention patterns Like GPT-3, with Distinguished localization (rope) For local coding.

Comparing architectural philosophy

Depth strategy against the show

QWEN3 30B-A3B Confirm The depth and diversity of experts:

  • 48 layers that allow multiple stages and hierarchical abstraction
  • 128 experts for each layer providing an accurate specialty
  • Suitable for the complex thinking tasks that require deep treatment

GPT -SS 20B Give priority Display and arithmetic density:

  • 24 layers with larger experts increase the representative ability of each layer
  • Less experts, but more powerful (32 versus 128) than individual experts’ ability
  • It has been improved to conclude one effective pass

Strategies for the Ministry of Water

QWEN3: Directing the symbols through 8 of 128 expertsEncouraging various treatment paths sensitive to context and steadfast decisions.

GPT -SS: Directing the symbols through 4 of 32 expertsMaximizing the arithmetic energy for each experience and providing concentrated treatment for each reasoning step.

Memory and publishing considerations

QWEN3 30B-A3B

  • Memory requirements: A variable dependent on accuracy and the length of the context
  • Publishing: Improved to spread the cloud and the edge with an extension of a flexible context
  • Quantity: Supports different quantitative measurements after training

GPT -SS 20B

  • Memory requirements: 16 GB with the original MXFP4 amount, ~ 48 GB in BFLOAT16
  • Publishing: Designed for consumer compatibility
  • QuantityThe original MXFP4 training allows effective reasoning without the deterioration of quality

Performance properties

QWEN3 30B-A3B

  • It excels in Sports thinking, coding and complex logical tasks
  • Strong performance in Multi -language scenarios Through 119 languages
  • Thinking Provides the complicated problems of thinking

GPT -SS 20B

  • Achieve Similar performance for Openai O3-MINI On standard standards
  • The optimum for Using the tool, browsing the web, and connecting to the job
  • strong Chain of thinking With adjustable thinking voltage levels

Use the status recommendations

Choose QWEN3 30B-A3B for:

  • Complex thinking tasks that require multi -stage treatment
  • Multi -language applications through various languages
  • Scenarians require an extension of the length of the flexible context
  • Applications in which the transparency of thinking/thinking is estimated

Choose GPT-SS 20B for:

  • Publishing operations restricted to resources that require efficiency
  • Applications and applications of the agent
  • Rapid inference with consistent performance
  • Edge publishing scenarios with limited memory

conclusion

QWEN3 30B-A3B and GPT-SS 20B represent the supplementary approach to architecture design. QWEN3 emphasizes the depth, diversity of experts, and multi -language capacity, which makes it suitable for complex thinking applications. GPT-SS 20B gives efficiency priority, integration of tools, elasticity of publishing, and placing it in practical production environments with resource restrictions.

Both models show the development of the MEE structure beyond the scaling of the simple parameter, as it includes advanced design options that are compatible with architectural decisions with intended use cases and publishing scenarios.

Note: This article is inspired by post Reddit and the scheme shared by Sebastian Raschka.


sources

  1. QWEN3 30B-A3B-embrace face
  2. QWEN3 Technical Blog
  3. QWEN3 30B-A3B Specifications
  4. QWEN3 30B-A3B Guidance 2507
  5. QWEN3 official documents
  6. QWEN Tokeenizer Documents
  7. QWEN3 features
  8. Openai GPT -SS Introduction
  9. GPT -SS GitHub warehouse
  10. GPT -SS 20B-Groq Documentation
  11. Openai GPT -SS technical details
  12. GPT -SS embrace
  13. Openai GPT -SS 20B
  14. Openai GPT -SS Introduction
  15. NVIDIA GPT -SS Artistic Blog
  16. GPT -SS embrace
  17. QWEN3 Performance Analysis
  18. Openai GPT -SS
  19. GPT-SS 20B capabilities


Michal Susttter is a data science specialist with a master’s degree in Data Science from the University of Badova. With a solid foundation in statistical analysis, automatic learning, and data engineering, Michal is superior to converting complex data groups into implementable visions.

Don’t miss more hot News like this! AI/" target="_blank" rel="noopener">Click here to discover the latest in AI news!

2025-08-07 05:02:00

Related Articles

Back to top button