AI

Alibaba Qwen Team Releases Qwen3-ASR: A New Speech Recognition Model Built Upon Qwen3-Omni Achieving Robust Speech Recogition Performance





The QWEN team for Alibaba Cloud QWEN3-RASR FlashASR (ASR) (ASR) (API) is based on the strong intelligence of QWEN3-UMNI, which simplifies multi-language versions, noisy, and specific domain without localization of multiple systems.

Main capabilities

  • Multi -language identificationIt supports automatic disclosure and copying across 11 languages, including English and Chinese, in addition to the Arabic, Spanish, Spanish, French, Italian, Japanese, Korean, Portuguese, Russian and Simplified Chinese language (ZH). This breadth places QWEN3-ESR for global use without separate models.
  • Contemporary injection mechanismUsers can glue the arbitrary text-names, terms of the field, and even irrational chains-bias. This is especially strong in the scenarios rich in expressions, appropriate names, or advanced language.
  • Powerful sound treatmentMaintains performance in loud environments, low quality recordings, the introduction of the distant field (for example, MICS distance), and multimedia singing like songs or rap music. The WER word error rate is still less than 8 %, which is technically impressive for such various inputs.
  • Typical simplicityThe complexity of preserving different models of languages ​​or sound contexts – one model with API service for all of them.

It extends for use cases on EDTECH platforms (lectures, multi -language educational lessons), media (subtitles, sound), and customer service (multi -language IVR or copy of support).

https://qwen.ai/blog?

Technical evaluation

  1. Language discovery + copy
    He discovers the automatic language model Select the language before copying-crushing for mixed environments or capturing a passive sound. This reduces the need to choose manual language and improves use.
  2. Distinguished symbol injection context
    Paste the text as the “context” to identify the expected vocabulary. Technically, this can work by controlling prefabs or an embedding at the entry flow to influence the decoder. It is a flexible way to adapt to the dictionary of the field without re -training the model.
  3. <8 % through complex scenarios
    SUB-8 % puts across music, RAP, background noise, and low-resolution sound QWEN3-ASR at the upper level of open recognition systems. For comparison, strong models on the goal of speech clean reading aims 3-5 %, but performance usually decomposes in loud or musical contexts.
  4. Multi -language coverage
    11 languages, including the difference in linguistic Chinese and languages ​​with changing vocal pleasures such as Arabic and Japanese, support multi -language training data and the ability of modeling across languages. Dealing with both linguistic languages ​​(mandarin) and non -trivial languages.
  5. The structure of the individual model
    Elegance in terms of operational: publishing one model for all tasks. This reduces OPS – no need to swap or define models dynamically. Everything works in a uniform ASR pipeline with the discovery of the compact language.

Publishing and clarification

It provides the embraced face space for QWEN3-ARSR Live Interface: Voice Download, Optional Entrance Context, choose a language or use an automatic discovery. It is available as API.

conclusion

QWEN3-ESR Flash (available as API) is a technically convincing ASR solution and a friend of publication. It provides a rare mix: multi-language support, perceived copy of the context, and learn about noise-all in one model.


verify API service, technical details and Expat for embracing face. Do not hesitate to check our GitHub page for lessons, symbols and notebooks. Also, do not hesitate to follow us twitter And do not forget to join 100K+ ML Subreddit And subscribe to Our newsletter.


Asif Razzaq is the CEO of Marktechpost Media Inc .. As a pioneer and vision engineer, ASIF is committed to harnessing the potential of artificial intelligence for social goodness. His last endeavor is to launch the artificial intelligence platform, Marktechpost, which highlights its in -depth coverage of machine learning and deep learning news, which is technically intact and can be easily understood by a wide audience. The platform is proud of more than 2 million monthly views, which shows its popularity among the masses.






Previous articleBest 7 Form Protocol Services (MCP) for VIBE coding


Don’t miss more hot News like this! Click here to discover the latest in AI news!

2025-09-09 09:15:00

Related Articles

Back to top button