[2508.00743] Agentic large language models improve retrieval-based radiology question answering

0 1 minute read

[Submitted on 1 Aug 2025]

Authors:Sebastian Wind, Jeta Sopa, Daniel Truhn, Maahshad Lotfinia, Tri-Hien Nguyen, Keno Bressem, Lisa Adams, Mirabela Rusu, HARALD Köstler, GERHARD Wellein, Andreas Maier, Sorooosh Tayeebi Arasthh

View the PDF file for the paper entitled Language Models Work

PDF view

a summary:Clinical decisions in radiology increases in artificial intelligence (AI), especially through LLMS models. However, traditional generation systems (RAG) to answer radiology questions (QA) usually depend on one step, which limits their ability to deal with complex clinical thinking tasks. Here we suggest the RAG working framework that enables LLMS to independently decompose radiology questions, recover clinical guides repeatedly from RadioPaedia, and to dynamically synthesize the evidence -based responses. We evaluated 24 LLMS that stretch on a variety of structure, parameters standards (from 0.5b to> 670b), training forms (for general purposes, improved logic, and clinically seized them), using 104 questions from expert radiation from the RSNA-Radioqa pre-established data collections. The agents retrieve significantly improved the accuracy of the yellow resolution (73 % compared to 64 %; P <0.001) and traditional online breach (73 % compared to 68 %; P <0.001). حدثت أكبر مكاسب في النماذج متوسطة الحجم (على سبيل المثال ، تحسنت كبيرة من 72 ٪ إلى 81 ٪) ونماذج صغيرة الحجم (على سبيل المثال ، أظهرت QWEN 2.5-7B من 55 ٪ إلى 71 ٪) ، في حين أظهرت نماذج كبيرة جدًا (> 200b) slight changes (<2 % improve). In addition, the recovery reduced the hallucinogenic agent (meaning 9.4 %) and a clinically related context recovery in 46 % of cases, which greatly helps realistic grounding. Even the clinical seized models showed significant improvements (for example, Medgemma-27B improved from 71 % to 81 %), indicating complementary roles to retrieve and refine them. These results highlight the capabilities of the family frameworks to enhance realism and diagnostic accuracy in guaranteeing quality in radiology, especially between medium -sized LLMS, which calls for future studies to verify their clinical benefit.