Beyond Refusal — Constructive Safety Alignment for Responsible Language Models

0 2 minutes read

[Submitted on 2 Sep 2025 (v1), last revised 12 Sep 2025 (this version, v4)]

Authors:Ranji Duan, Gixi Liu, Chiaukon Gia, Shiji Zhao, Roxy Cheng, Fengsiang Wang, Cheng Wei, Yong Shi, Chang Leo, Defing Lee, Yenbing Dong, Yichi Chang, Yuvening Chen, Chongwin Wang, Xingjun MA, Xinfeng Li, Yitong Sun, Jie Zhang, Jinzhao Hu, SHA Xu, Yitong Yang, Jialing Tao, Hui Xue

View the PDF file from the paper entitled Oyster-I: Beyond the Rejection-Witness of Constituent Safety for the Language Models, by Ranji Duan and 26 other books

PDF view

a summary:LLMS models publish safety mechanisms to prevent harmful content. Most current methods are tightly focused on the risks posed by harmful actors, and the risks are often timited as aggressive events and dependence on defensive rejection. However, in realistic settings, the risks also come from non -harmful users who are looking for help during psychological distress (for example, intentions of self -harm). In such cases, the model’s response can strongly affect the following procedures for the user. Simple rejection may lead them to repeat, escalate, or move to unsafe platforms, creating worse results. We offer constructive safety alignment (CSA), a human -focused model that protects against harmful misuse with active vulnerable users towards safe and useful results. CSA was implemented in Oyster-I (OY1), combining the expectations of the theory of the user’s reactions, discovering the limits of accurate risks, controlling interpretative thinking, and converting safety to the confidence-building process. OY1 achieves a modern safety between open models while maintaining high general capabilities. In our construction index, a strong constructive participation, near GPT-5, and the unparalleled durability on the Peace Program Data set, is close to GPT-O1 levels. By shifting from the first rejection to safety in the first guidance, CSA redefines the relationship of users, with the aim of not only safe systems, but it is very useful. We version OY1, the symbol and the standard to support the responsive user -based artificial intelligence.