AI

Collaborative Reward Modeling for LLM Alignment

Authors:Jiazheng Zhang, Wenqing Jing, Zizhuo Zhang, ZHIHING XI, Shihan Dou, Rongxiang Weng, Jiahuan Li, Jingang Wang, Mingxu Chai, Shibo Hong, Tao Gui, Qi Zhang

PDF display of the paper entitled Aq and Al -Aqlan is better than one: cooperative bonuses model

PDF HTML (experimental) view

a summary:RMS models play a pivotal role in the alignment of large LLMS models with human values. However, loud preferences in human reactions can lead to a talisman reward – a phenomenon where bonus models learn false bonds or overcome loud preferences, which constitute important challenges to circulate RMS. This paper systematically analyzes the characteristics of preference pairs and aims to determine how loud preferences differ from human preferences in rewards. Our analysis reveals that loud preferences are difficult for RMS suitable, as they cause severe training fluctuations and irregular gradual updates. These distinctive dynamics indicate the feasibility of identifying and excluding these loud preferences. Experimental studies show that the LLM policy has been improved using a bonus model trained in the full preference data set, which includes a large noise, which performs worse than that trainer on a sub -group of high -quality preferences exclusively. To face this challenge, we propose a framework for cooperative rewards models online (CRM) to achieve strong preferential learning by reviewing the peer and learning curricula. In particular, CRM maintains RMS, which is cooperatively loud loud preferences by reviewing each other’s data choices. Learning curriculum curricula coincides with the capabilities of two models, which reduces excessive variations to enhance the benefit of peer review. Wide experiences show that CRM greatly enhances the RM circulation, with an improvement of up to 9.94 points on Barendbench under 40 % extremist noise. Moreover, CRM can smoothly extend into implicit alignment methods, providing a strong and useful alignment strategy.

The application date

From: Mingxu Chai [view email]
[v1]

Thursday, 15 May 2025 10:58:20 UTC (4,803 KB)
[v2]

Mon, 19 May 2025 03:28:14 UTC (6,702 KB)

Don’t miss more hot News like this! AI/" target="_blank" rel="noopener">Click here to discover the latest in AI news!

2025-05-20 04:00:00

Related Articles

Back to top button