AI

[2410.19982] Random Policy Enables In-Context Reinforcement Learning within Trust Horizons

PDF display of the paper entitled “RAM” allows learning to reinforce within the context within the prospects of confidence, by Weiqin Chen and 1 other authors

PDF HTML (experimental) view

a summary:Preparation models showed an unusual performance in learning in the context, allowing the zero circular to the new tasks that were not faced during training. In the case of reinforcement learning (RL), RL appears in the context (ICRL) when training FMS on decision -making problems in an automatic supervision. However, modern ICRL algorithms, such as the algorithm distillation, the prior transformer and the resolution’s importance, impose strict requirements on the pre -data set regarding source policies, context information and work marks. It is worth noting that these algorithms either require perfect policies or require varying degrees of behavior policies well trained for all pre -environments. This greatly hinders the ICRL application on scenarios in the real world, where it can be perfect or well -trained policies for a large size of the real world’s training environments. To overcome this challenge, we present a new approach, called distillation in case of case (SAD), which allows the creation of an effective data set to direct it only through random policies. In particular, SAD chooses query cases and corresponding procedures posters by distillation of the pending work of the entire state and work spaces using random policies on the horizon of confidence, then inherits the mechanism that is supervised by classic automatic spontaneity during training before training. To our knowledge, this is the first work that enables the effective ICRL under random policies and random contexts. We also create a quantitative and trustworthy analysis as well as performance guarantees for sadness. Moreover, our experimental results across many famous ICRL environments show that, on average, SAD outperforms the best basis line by 236.3 % in the internet -related evaluation and 135.2 % in online evaluation.

The application date

From: Weiqin Chen [view email]
[v1]

Fri, 25 Oct 2024 21:46:25 UTC (1,081 KB)
[v2]

Tuesday, 14 Jan 2025 06:18:03 UTC (614 KB)
[v3]

Fri, 2 May 2025 03:19:49 UTC (1,033 KB)

Don’t miss more hot News like this! AI/" target="_blank" rel="noopener">Click here to discover the latest in AI news!

2025-05-05 04:00:00

Related Articles

Back to top button