AI

Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning

Authors:Yang Zhou, Sunzhu Li, Shunyu Liu, Wenkai Fang, Jiale Zhao, Jingwen Yang, Jianwei LV, Kongccheng Zhang, Yihe Zou, Hengtong Lu, Wei Chen, Yan Xie, Mingli Song

View the PDF file from the paper entitled Breaks of the bottle Explore

PDF view

a summary:Recent developments in the LLMS models have confirmed the reinforcement learning potential (RL) to facilitate the emergence of thinking capabilities. Despite encouraging results, the main dilemma is still standing as RL improves learning from high -quality samples, however exploration of these samples is still limited by the restrictions inherent in LLMS. In fact, it creates an unwanted cycle that cannot be explored. In this work, we suggest the reinforcement learning that revolves around the rate of models (Ruscarl), a new educational scaffolding frame designed to break the bottle cervical exploration of the LLM general logic. Specifically, Ruscarn offers models similar to the review menu such as (1) explicit exploration scaffold while generating scroll, as different models are provided as external guidelines within the task instructions to direct various high -quality responses. This guidance is gradually decomposed over time, which encourages the model to absorb basic thinking patterns; (2) Auditable rewards for exploitation during typical training, where we can get strong LLM-AS-A-Judge using models as references, allowing RL effective to general thinking tasks. Wide experiences show the suggested Ruscarn superiority through various standards, and expanding the limits of thinking effectively in the framework of the best evaluation of N. It is worth noting that Ruscarl greatly enhances QWEN2.5-7B-Instruct from 23.6 to 50.3 on Healthbench-500, bypassing GPT-4.1. Moreover, the well-seized alternative to QWEN3-30B-A3B-Instruct achieves 61.1 over Healthbench-500, outperforming the leading LLMS including Openai-O3. This work is still on progress, and we will export the code, models and data groups soon.

The application date

From: Yang Chu [view email]
[v1]

Saturday, 23 August 2025 08:47:31 UTC (3,938 KB)
[v2]

Tuesday, Aug 26, 2025 10:52:15 UTC (3,917 KB)

Don’t miss more hot News like this! AI/" target="_blank" rel="noopener">Click here to discover the latest in AI news!

2025-08-27 04:00:00

Related Articles

Back to top button