AI

[2504.05410] Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling

Authors:Benjamin Leipkin, Benjamin Liberation, Jacob Hoover Vigille, Joao Lula, David R. Makver, Lee Du, Jasson Eisner, Ryan Cotrell, Vicash Mansinega, Timothy J. Odonil, Alexander K.

View the PDF file from the paper entitled “The fast generation that controls it” from language models with adaptive balanced samples, by Benjamin Leipkin, Benjamin Liberon, Jacob Hoover Vigi, Lula and David R.

PDF HTML (experimental) view

a summary:The prevailing approach to generating language models subject to some restrictions is locally deciphering (LCD), taking samples of increasing symbols in every time step so that the registration is never violated. Usually, this is achieved by hiding the distinctive symbol: a loop on vocabulary and excluding non -matching symbols. There are two important problems in this approach. (I) The evaluation of the registration on each symbol can be prohibited – often the LM vocabulary exceeds $ 100,000. (2) The LCD screen can distort the global distribution on the tendons, take samples of only -based symbols to local information, even if they are leading deadlines. This work offers a new algorithm that addresses both problems. First, to avoid evaluating the restriction on the full vocabulary in each step of the obstetric step, we suggest the adaptive rejection samples of rejection samples that usually require orders to evaluate restrictions. Second, we explain how this algorithm can be extended to produce low and unbiased estimates of the importance of importance at a very small cost-excrements that can be used properly within the pre-proposed Monte Carlo algorithms to correct excessive behavior to enforce local restrictions. Through the extensive experimental evaluation in the text to SQL, molecular synthesis, target inference, matching patterns, and JSON fields, we make it clear that our approach is superior to modern basic lines, support a wider category of restrictions and improve the time of operation and performance. Additional theoretical and experimental analyzes show that the efficiency of the time of operation of our method is driven by its dynamic calculation, expansion with the difference between the unrestricted and restricted LM, and as a result, the operating time improvements are greater for better models.

The application date

From: Benjamin Lipkin [view email]
[v1]

Monday, 7 April 2025 18:30:18 UTC (4,650 KB)
[v2]

Monday, 18 August 2025 16:10:18 UTC (4,625 KB)

Don’t miss more hot News like this! AI/" target="_blank" rel="noopener">Click here to discover the latest in AI news!

2025-08-19 04:00:00

Related Articles

Back to top button