NVIDIA AI Just Released the Largest Open-Source Speech AI Dataset and State-of-the-Art Models for European Languages

1 3 minutes read

1755322243 NVIDIA AI Just Released the Largest Open Source Speech AI Dataset.png

NVIDIA has taken a big leap in AI’s development for multi -language speech, unveiling siloThe largest open source speech data collection for European languages, two models of the latest model: Canary-1B-V2 and Albukrah-TDT-0.6B-V3. This version sets a new standard of high -quality resources that can be accessed in automatic recognition of speech (AST), especially for European -represented European languages.

Granary: The basis of multi -language, artificial intelligence

silo It is a huge and multi -language collection that was developed in cooperation with the University of Carnegie Mellon and Frendizon Bruno Kissler. It delivers One million hours of soundwith 650,000 hours to learn about speech and 350,000 for translating speech. The data collection covers 25 European languages – represents almost all official European languages, as well as Russian and Ukrainian focus – with a decisive focus on languages with limited explanatory data, such as Croatian, Estonian and Multiple.

Main Features:

The largest open source speech data collection For 25 European languages.
False toxic pipeline: Unnamed general audio data is processed using the NVIDIA NEMO speech processor, which adds a structure and enhances quality, which reduces the need for intense manual suspension in resources.
Supports ASR and AST: Designed for copying and translation tasks.
Open access: The global developer community is available for flexible production models.

By taking advantage of high -quality clean data, The pills allow convergence significantly faster. The research shows that developers need Half of the informants’ data to reach the targeted accuracy compared to competing data groupsWhich makes it of special value for the languages restricted to resources and rapid initial models.

Canary-1B-V2: ASR + multi-language translation (en ↔ 24 languages)

Canary-1B-V2 he Cheviest Bular Model from the teacher Train on grains, provide high -quality copies and translation between English and 24 supported European languages.

It is an indication of accuracy and multiple task capabilities:

Supported languages: 25 European languages, the canary coverage is doubled from 4.
Latest performance: Similar accuracy for models three times larger, however Up to 10 x faster conclusion.
Multi -task power: Strong across ASR and AST tasks.
Signs: Automatic punctuation marks, drawing, word and temporal streams at the piece level-even outputs translated timeline.
Bunyan: FastConformer encryption with the transformer decoding unit; Unified vocabulary for all languages via Sntencepiece tokeenizer.
Durabness: It maintains strong performance under noisy conditions and resists hallucinations.

The most prominent evaluation:

ASR (WER) error rate: 7.15 % (AMI Data Group), 10.82 % (Librispeech Clean).
AST comet degrees: 79.3 (x → English), 84.56 (English → x).
Publishing: Available under CC with a license 4.0; It has been improved for miraculous NVIDIA GPU, allowing rapid training and inference to use developmentable production.

Parakeet-Tdt-0.6B-V3: ASR in real multi-language real

Albukrah-TDT-0.6B-V3 he ASR multi -language model 600 million teacher It is designed for highly productive or large versions of all supported languages.

Discovering automatic language: The input sound is copied without the need for additional claims.
Real time ability: It is efficiently forgotten up to 24 minutes of the audio sectors in one conclusion corridor.
Fast and developing, and commercial ready: It gives priority to the transmission of low cumin, impulses, and fine outputs, with timelines at the level of words, punctuation, and drawing.
Durabness: Reliable even on complex content (numbers, songs) and difficult sound conditions.

Impact on AI’s development of speech

Granary and Granary Suite Data collection in NVIDIA acceleration of AI’s addiction to Europe, allowing developmentable development from:

Chatbots multi -language
Customer service audio agents
Translation services near time

Developers, researchers and companies can now build comprehensive and high -quality applications that support linguistic diversity, with open access to these wonderful models and data groups

verify silo, Nvidia Canary-1B-V2 and Nvidia parakete-tdt 0.6B-V3. Do not hesitate to check our GitHub page for lessons, symbols and notebooks. Also, do not hesitate to follow us twitter And do not forget to join 100K+ ML Subreddit And subscribe to Our newsletter.

Asif Razzaq is the CEO of Marktechpost Media Inc .. As a pioneer and vision engineer, ASIF is committed to harnessing the potential of artificial intelligence for social goodness. His last endeavor is to launch the artificial intelligence platform, Marktechpost, which highlights its in -depth coverage of machine learning and deep learning news, which is technically sound and can be easily understood by a wide audience. The platform is proud of more than 2 million monthly views, which shows its popularity among the masses.

Don’t miss more hot News like this! Click here to discover the latest in AI news!

2025-08-16 05:29:00

1 3 minutes read