[2507.13264] Voxtral

0 1 minute read

[Submitted on 17 Jul 2025]

Authors:Alexander H. Liu, Andy Erinberg, Andy Lu, Clement Dnox, Corininten Barrow, Gomium Lamb, Jean Malo Delmanyon, Khyuhyne Raghavi Chando, Patrick von Plattin, Pavankomar Riddi, Yang, Albert K. Jiang, Alexander Splamerols, Amily Heleio, Ameli Martin, Anamol Aguardus, Antoine Row, Arthur Darsit, Arthur Minche, Christian, Christian, Christian, Christian, Christian, Christian, Chris. Lanfranchi, Darius Dabert, Devendra Singh Chaplot, Devon Mizele, Diego de Las Casas, Elliot Chane-SANE, Emilien Fugier, Emma Bou Hanna, Gabrielle Berrada, Gauthier Delece, Gauthier GuINET, Georgii Novikov, Guillly Martin. Jason Root, Jean Hadian Chuhran, Jessica Shudnovsky, Joachim Statia, Job Parmenlo, Jonas Ammar, Joseline Summeric Roberts, Julian Deniz, Karan Saxina, Carmach Yadaf, Cartik Khandloual, Kush Jin, Lilio Renard, Laavuko, Lion, Lingxio , Martin, Lusail Solnener, Liu Gao, Mary Bilat, McCillo Guillerine, Matisse Villardos, Matteo Denot, Maxim Darren, Maximilian Augustine, Michael Seasons, Neha Gobta, Nick Raraman, Olivier Dosheen, Patricia Wang, Paul. Kurylowicz, Philomène Chagniot, Pierre Stock, Pravesh Agrawal, Rémi Delacourt, Romain Sauvestre, Roman Soletskyi, Sagar Vize, Sandeep Subramianian, Saurabh Garg, Thashwat Dalal, Siddhath Gandhi, Sumuk Aithal, SZYK, Shueller, Thibbot Lavril, Thomas Robert, Thomas Wang, Timothy Lacrovi, Tom Pioli, Valeria Nemeshnikova, Victor Balz and others. (6 additional authors did not appear)

View a PDF file from the paper entitled Voxtral, written by Alexander H. Liu and 105 other authors

PDF HTML (experimental) view

a summary:We offer Voxral Mini and Voxtral Small, two multimedia voice chat model. Voxtral has been trained to understand both sound documents and text, and achieve a newer performance through a variety of audio standards, while maintaining strong text capabilities. Voxral Small is outperforming a number of closed models, while they are small enough to operate locally. The 32K context window enables the form to deal with audio files up to 40 minutes in the period and long -turns interviews. We also contribute three criteria for evaluating examples of speech understanding on knowledge and trivia. Both VoxTral are released under the APache 2.0 license.