Multi-Agent simulaTion fRamework for safe Interactions and conteXtual clinical conversational evaluation

0 1 minute read

Multi Agent simulaTion fRamework for safe Interactions and conteXtual clinical conversational.png

[Submitted on 26 Aug 2025]

Authors:Ernest Lim, Yaji Vera Hu, Jared Goselovitz, Kate Preston, Mohaita Chaudhry, Luis Williams, Iceng Hayam, Katrina Masson, Marianne Milo, Tom Luton, Yan Jia, Ibrahim Hubble

PDF display of the paper entitled Al -Mafsouf: A multi -agent simulation frame

PDF HTML (experimental) view

a summary:Despite the increasing use of LLMS models in clinical dialogue systems, current assessments focus on completing the task or fluency, providing little insight on the requirements of behavior management and the basic risks of critical safety systems. This matrix (multi -agent simulation framework is presented to safe interactions and the evaluation of contextual clinical conversation), which is an organized and extended framework for evaluation directed towards safety for clinical dialogue factors.

Matrix merges three components: (1) A classification of clinical scenarios, expected system behaviors and failures of failure derived through organized safety engineering methods; (2) Behvjudge, a LLM based on to detect safety -related dialogue, was verified against the explanation of expert doctors; And (3) Patot, the patient’s agent is a simulation capable of producing various responses and how the scenario, and evaluation for realism and behavioral sincerity with experience in human factors, and the study of the patient’s allocation.

Through three experiments, we appear that the matrix allows for systematic and developmental safety assessment. Behvjudge with Gemini 2.5-PRO achieves risk detection at the level of experts (F1 0.96, allergies 0.999), surpassing doctors in a blind rating of 240 square. We also conducted one of the first realistic analyzes to simulate the patient -based patient, which indicates that Paulot reliably simulates the patient’s realistic behavior in quantitative and qualitative assessments. Using the matrix, we explain its effectiveness in measuring five LLM factors across 2100 squares simulating 14 risk and 10 clinical fields.

Matrix is the first framework for unifying safety engineering organized by evaluating artificial intelligence, developing the developmental conversation, allowing safety safety scrutiny. We issue all evaluation tools, demands, regulatory scenarios and data groups.