Surveying Techniques from Alignment to Reasoning

0 2 minutes read

[Submitted on 8 Mar 2025 (v1), last revised 21 May 2025 (this version, v2)]

Authors:Tie Guyao Tie, Zeli Zhao, Dingjie Song, Fuyang Wei, Rong Zhou, Yurou Dai, Wen Yin, Zhejian Yang, Jiangyue Yan, Yao Su, Liu, Neil Zhenqiang Gong, Jiliang Tang, Caiming Xiong, Heng Ji, Philip S. Yu, Jianfeng Gao

View the PDF file for the paper entitled Long Language Models after Training: Surveying techniques from alignment to think

PDF HTML (experimental) view

a summary:The appearance of large language models (LLMS) has mainly transformed the natural language processing, making it indispensable through areas ranging from conversation systems to scientific exploration. However, their pre -trained structures often reveal restrictions in specialized contexts, including restricted thinking capabilities, moral doubts, and optimal performance. These challenges require post-training language models (Polms) to address these defects, such as Openai-o/O3 and Deepseek-R1 (combined with large thinking models, or LRMS). This paper displays the first comprehensive survey of polms, which systematically tracks its development through five basic models: installation, which enhances the accuracy of the task; Alignment, which guarantees moral cohesion and compatibility with human preferences; Thinking, which develops multi -steps inference despite the challenges in the design of rewards; Efficiency, which improves resource use amid increasing complexity; Integration and adaptation, which extends the capabilities through various methods with tackling cohesion issues. Drawing the progress made from ChatGPT alignment strategies to innovative developments to think about Deepseek-R1, we explain how Polms benefit from data groups to reduce prejudices, deepen thinking capabilities, and enhance the ability to adapt to the field. Our contributions include a pioneering creation of the development of Polm, the classification of data technology organizer, and a strategic business schedule that emphasizes the role of LRMS in improving thinking efficiency and field flexibility. As a first survey of its scope, this work unifies the recent developments in Polm and establishes a strict intellectual framework for future research, and enhances the development of LLMS that excels in accuracy, moral durability, and diversity through scientific and societal applications.