CNTXT AI Launches Munsit: The Most Accurate Arabic Speech Recognition System Ever Built

0 4 minutes read

CNTXT AI Launches Munsit The Most Accurate Arabic Speech Recognition.png

In a decisive moment of artificial intelligence in Arabic, Centxt AI unveiled Munsit, a model of the next generation of identifying Arabic speech not only more accurate at all for Arabic, but decisively outperforms global giants such as Openai, Microsoft, Microsoft and EnevenLabs on his standard. Munsit has been developed in the United Arab Emirates and is specially designed for Arabic from A to Z, and it represents a strong step forward in what Cntxt calls “sovereign artificial intelligence” – the technology built in the region, for the region, but with global competitiveness.

The scientific foundations of this achievement are laid in the newly published paper of the team, “The promotion of recognition of Arab speech through learning subject to widespread supervision“And that provides a training and effective training method that deals with the scarcity of Arabic speech data called long time. This method enabled the arrogant supervision learning – the team to build a system that sets a new tape for the quality of copies through both the modern Arabic standard (MSA) and more than 25 regional accents.

Overcoming the drying of data in ASR Arabic

Arabic, although it is one of the most familiar languages in the world and the official language of the United Nations, is a long -term language for a long time in speech recognition. This stems from all of its morphological complexity and the lack of large and varied speech collections called. Unlike the English language, which benefits from endless hours of manually copied audio data, the richness of dialectical in the Arabic language and the fragmented digital presence represents great challenges to build strong systems to recognize speech (ASR).

Instead of waiting for the slow and costly process for manual copying to catch up with the knees, follow the Centxt Ai a more radically developed path: weak supervision. Their approach began with a huge group of more than 30,000 hours of the unimportant Arab voice collected from various sources. Through the specially designed data processing pipeline, this raw sound was cleaned and divided, and it was automatically classified to give a high-quality training data set for 15,000 hours-one of the largest and most representative ever in the field of Arabic speech.

This process did not rely on human illustrative comments. Instead, CNTXT has developed a multi -stage system for generating and evaluating the assessment and liquidation of multiple ASR models. These copies were tangled using the levenshtein distance to determine the most consistent hypotheses, and then passed through a language model to evaluate their vibrant grammar. The parts that failed to meet the specified quality thresholds have been ignored, ensuring that even without human verification, the training data remained reliable. The team improved this pipeline through multiple repetitions, every time it improves the accuracy of the name by re -training the ASR system itself and returning it to the process of setting signs.

Monsit operation: conformity structure

In the heart of Munsit is the matching model, which is a hybrid neural network structure that combines the local sensitivity of the tawafruce and the global sequence modeling capabilities of transformers. This matching design makes particularly adequate in dealing with the nuances of the spoken language, where long -term dependencies (such as a sentence) and the fine -grained sound details are very important.

Centxt AI implemented a significant variable of matching, and trained from zero point using 80 spectrum spectrum channels as an entrance. The model consists of 18 layers and includes approximately 121 million teachers. Training on a high -performance group was conducted using eight NVIDIA A100 graphics units with BFLOAT16 resolution, allowing effective dealing with huge impulses and high -dimensional features. To deal with the distinctive symbol of the form of the morphological Arabic language, the team used a governed symbol that was specifically trained in its designated collection, which led to vocabulary of 1024 sub -units.

Unlike the traditional training subject to the supervision of ASR, which usually requires the associate of each audio clip with a poster carefully copied, the CNTXT method works completely on weak stickers. These stickers are improved, although they are more noisy than the human verified, through the counted feeding ring that gave priority to consensus, grammatical cohesion, and lexical reasonable. The model has been trained using a CTC time classification loss function, which is perfectly suitable for an improved sequence model-an eccentric matter for speech recognition tasks where the timing of spoken words is variable and unpredictable.

Control standards

The results talk about themselves. Munsit was tested against ASR Open and commercial source for six qualified Arab data collections: SADA, Joint Voice 18.0, MASC (Clean and Busty), MGB-2, and boats. Collectively extend these data groups, dozens of dialects and dialects throughout the Arab world, from Saudi Arabia to Morocco.

In all standards, Munsit-1 achieved the average word error (WER) from 26.68 and the letter error (CER) of 10.05. In comparison, the best -performing version of Whisper’s Whisper recorded average rate of 36.86 and Cer from 17.21. Meta’s SeamlessM4T came, another multi -language model, higher. Munsit surpassed each other system on both clean and loud data, and showed a particularly strong durability in loud conditions, which is a decisive factor for applications in the real world such as communication centers and public services.

The gap was equally shouting against property systems. Munsit surpassed the Arabic ASR models of Microsoft Azure, ElevenLabs Scribe, and even the GPT-4O feature from Openai. These results are not marginal gains – they represent a average relative improvement of 23.19 % in Wer and 24.78 % in CER compared to the strongest baseline open, which proves Monsit as a clear leader in identifying Arabic speech.

Airaic Voice AI’s future platform

While Munsit-1 is already transforming the possibilities of copying, translation and supporting customers in Arabic-speaking markets, CNTXT AI believes that this launch is just a beginning. The company imagines a full range of audio technologies in Arabic, including the text into words, voice assistants, and actual translation systems-all of which are based on sovereign infrastructure and regional relevant AI.

“Munsit is more than just a penetration of speech recognition,” said Mohamed Abu Sheikh, CEO of Centxt Ai. “It is a declaration that the Arabic language belongs at the forefront of Amnesty International. We have proven that artificial intelligence of the world-class does not need to be imported-it can be built here in Arabic.”

With the emergence of the region’s models such as Munsit, the artificial intelligence industry enters a new era-the linguistic and cultural importance is not sacrificed in the pursuit of artistic excellence. In fact, with munsit, Centxt AI showed that it is one and the same thing.

Don’t miss more hot News like this! Click here to discover the latest in AI news!

2025-04-30 22:22:00

0 4 minutes read