AI

Toward Expert-Level Medical Text Validation with Language Models

Authors:A high lion, Vasiliki Pikia, Maya Pharma, Nicole Cyu, Sophie Austire, Arnaf Singvy, Majdalini Bachelon, Ashwain Kumar, Andrew Johnston, Karimar Amador Martins, Eduardo Juan Perereio, Paula Nazi Cruz. Bluethgen, Eduardo Ptess Reis, Eddy D. Zande Van Rilland, Poonam Laxmappa Hosamani, Kevin R Keet, Minjoung Go, Evelyn Ling, David B. Larson, Curtis Langlotz, Roxana Daneshjou, Jason Home, Sanmi Koyejo,

View the PDF file from the paper entitled MedVal: Towards the validity of the medical text at the level of experts with language models, by ASAD ALI and 26 other authors

PDF HTML (experimental) view

a summary:With the increasing use of LMS models in clinical environments, there is an immediate need to assess the accuracy and safety of the LM medical text. Currently, this evaluation depends only on the review of the doctors. However, the discovery of errors in the text created by LM is difficult because 1) manual review is expensive and 2) The reference outcomes are often available to experts not available in real world settings. While the “LM-ES-LG” model (LM is another LM assessment) provides a developmental evaluation, even the border LMS can miss hidden but clinical importance. To face these challenges, we suggest medval, a new distillation method, subject to self -supervision, and the effectiveness of data efficiency that benefits from the artificial data for LMS training to assess whether the medical outcomes created LM are actually consistent with inputs, without the need for doctors ’posters or reference outputs. To evaluate LM performance, we offer MedVal-Betic, a data collection that includes 840 outputs, declared doctors through 6 various medical tasks that pick up challenges in the real world. Through 10 modern LMS diseases that extend to open source and ownership models, MedVal distillation (P <0.001) with doctors across the tasks seen and invisible, increases the average F1 from 66 % to 83 %. Despite the strong basic performance, medval improves the best LM (GPT-4O) by 8 % without training in the data called the doctor, indicating a statistical performance that is not overlapping with one human expert (P <0.001). To support a developmental path, perceived risk towards clinical integration, we are open source: 1) Code Base (URL https this), 2) bench (URL https), 3) MedVal-4B (URL https this). Our index provides evidence that LMS is close to the ability of experts to verify the medical text created by artificial intelligence.

The application date

From: a high lion [view email]
[v1]

Thursday, 3 July 2025 20:19:18 UTC (5,655 KB)
[v2]

Mon, July 14, 2025 17:51:35 UTC (5,656 KB)
[v3]

Tuesday, 2 Sep 2025 02:30:57 UTC (5,812 KB)
[v4]

Thursday, 18 Sep 2025 04:11:49 UTC (8,582 KB)

Don’t miss more hot News like this! AI/" target="_blank" rel="noopener">Click here to discover the latest in AI news!

2025-09-19 04:00:00

Related Articles

Back to top button