Beyond Refusal — Constructive Safety Alignment for Responsible Language Models

View the PDF file from the paper entitled Oyster-I: Beyond the Rejection-Witness of Constituent Safety for the Language Models, by Ranji Duan and 26 other books
PDF view
a summary:LLMS models publish safety mechanisms to prevent harmful content. Most current methods are tightly focused on the risks posed by harmful actors, and the risks are often timited as aggressive events and dependence on defensive rejection. However, in realistic settings, the risks also come from non -harmful users who are looking for help during psychological distress (for example, intentions of self -harm). In such cases, the model’s response can strongly affect the following procedures for the user. Simple rejection may lead them to repeat, escalate, or move to unsafe platforms, creating worse results. We offer constructive safety alignment (CSA), a human -focused model that protects against harmful misuse with active vulnerable users towards safe and useful results. CSA was implemented in Oyster-I (OY1), combining the expectations of the theory of the user’s reactions, discovering the limits of accurate risks, controlling interpretative thinking, and converting safety to the confidence-building process. OY1 achieves a modern safety between open models while maintaining high general capabilities. In our construction index, a strong constructive participation, near GPT-5, and the unparalleled durability on the Peace Program Data set, is close to GPT-O1 levels. By shifting from the first rejection to safety in the first guidance, CSA redefines the relationship of users, with the aim of not only safe systems, but it is very useful. We version OY1, the symbol and the standard to support the responsive user -based artificial intelligence.
The application date
From: Ranji Duan [view email]
[v1]
Tuesday, 2 Sep 2025 03:04:27 UTC (5,745 KB)
[v2]
Thursday, 4 Sep 2025 11:54:06 UTC (5,745 KB)
[v3]
Monday, 8 Sep 2025 15:18:35 UTC (5,746 KB)
[v4]
Fri, 12 Sep 2025 04:23:22 UTC (5,747 KB)
Don’t miss more hot News like this! AI/" target="_blank" rel="noopener">Click here to discover the latest in AI news!
2025-09-15 04:00:00