Alibaba’s new Qwen model to supercharge AI transcription tools
Amnesty International Copies are about to get more competition with the QWEN team in Alibaba, which pulls the QWEN3-ESR-Flash model.
Depending on the intelligence of the strong QWen3-UMNI and training in the use of a huge data set with tens of millions of hours of speech data, this is not just another model to identify AI’s speech. The team says it is designed to provide a very accurate performance, even when he faces difficult audio environments or complex language styles.
So, how do you accumulate against competition? Performance data, from the tests conducted in August 2025, indicate that they are somewhat impressive.
In a general test for standard Chinese, QWEN3-RASR-Flash achieved an error rate of only 3.97 percent, leaving competitors such as Gemini-2.5-PRO (8.98 %) and GPT4O TRESRINCE (15.72 %) behind them and showed more AI’s competitiveness tools for AI’s discourse.
QWEN3-ESR-Flash also proved skillful in dealing with Chinese dialects, at a fault rate of 3.48 percent. In the English language, 3.81 percent recorded, once again comfortably overpowering Gueini by 7.63 percent and 8.45 percent of GPT4O.
But where it really turns into the heads in a notorious difficult area: copying the music.
When assigned to learn about the lyrics of songs from songs, QWEN3-RASR-Flash posted an error rate of only 4.51 percent, which is much better than his competitors. This ability to understand music in internal tests has been confirmed on full songs, as a 9.96 percent error rate recorded; A significant improvement on 32.79 percent of Gemini-2.5-PRO and 58.59 percent of GPT4O TRESRIBE.
Besides its impressive accuracy, the model brings some innovative features to the table for copying tools from the next generation of artificial intelligence. One of the biggest streams in the game is a flexible contextual bias.
Forget the days of the keyword lists of hardest format, this system allows users to feed the text wallpaper in almost any format for customized results. You can provide a simple menu of keywords or entire documents or even a messy mixture of the two.
This process eliminates any need for complex prior treatment of contextual information. The model is smart enough to use the context to increase its accuracy; However, its general performance is not affected even if the text it provides is completely unrelated.
This Alababa’s ambition is clearly that this artificial intelligence model is to become a global copying tool for speech. The service provides accurate copies of one model covering 11 languages, with the completion of many dialects and dialects.
Supporting the tray is particularly deep, as it covers mandarin as well as main accents such as Kantonia, Sichua, Minnean (Hawkin) and Wu.
For English speakers, it deals with British, American dialects and other regional dialects. The list of impressive languages of other -supported French, German, Spanish, Italian, Portuguese, Russian, Japanese, Korean, and Arabic languages includes.
In order to bring everything closer, the model can determine which 11 languages that are spoken and are inligged into rejecting slides that do not speak like silence or background noise, ensuring cleaner output from the past verbal copies.
See also: SEDHARTHA ChOUDHURY, Booking.com: online fraud with artificial intelligence

Do you want to learn more about artificial intelligence and large data from industry leaders? Check AI and Big Data Expo, which is held in Amsterdam, California, and London. The comprehensive event is part of Techex and is determined with other leading technological events, click here for more information.
AI News is supported by TechForge Media. Explore other web events and seminars here.
Don’t miss more hot News like this! Click here to discover the latest in AI news!
2025-09-08 16:33:00



