[2504.05410] Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling

0 2 minutes read

250405410 Fast Controlled Generation from Language Models with Adaptive Weighted.png

[Submitted on 7 Apr 2025 (v1), last revised 18 Aug 2025 (this version, v2)]

Authors:Benjamin Leipkin, Benjamin Liberation, Jacob Hoover Vigille, Joao Lula, David R. Makver, Lee Du, Jasson Eisner, Ryan Cotrell, Vicash Mansinega, Timothy J. Odonil, Alexander K.

View the PDF file from the paper entitled “The fast generation that controls it” from language models with adaptive balanced samples, by Benjamin Leipkin, Benjamin Liberon, Jacob Hoover Vigi, Lula and David R.

PDF HTML (experimental) view

a summary:The prevailing approach to generating language models subject to some restrictions is locally deciphering (LCD), taking samples of increasing symbols in every time step so that the registration is never violated. Usually, this is achieved by hiding the distinctive symbol: a loop on vocabulary and excluding non -matching symbols. There are two important problems in this approach. (I) The evaluation of the registration on each symbol can be prohibited – often the LM vocabulary exceeds $ 100,000. (2) The LCD screen can distort the global distribution on the tendons, take samples of only -based symbols to local information, even if they are leading deadlines. This work offers a new algorithm that addresses both problems. First, to avoid evaluating the restriction on the full vocabulary in each step of the obstetric step, we suggest the adaptive rejection samples of rejection samples that usually require orders to evaluate restrictions. Second, we explain how this algorithm can be extended to produce low and unbiased estimates of the importance of importance at a very small cost-excrements that can be used properly within the pre-proposed Monte Carlo algorithms to correct excessive behavior to enforce local restrictions. Through the extensive experimental evaluation in the text to SQL, molecular synthesis, target inference, matching patterns, and JSON fields, we make it clear that our approach is superior to modern basic lines, support a wider category of restrictions and improve the time of operation and performance. Additional theoretical and experimental analyzes show that the efficiency of the time of operation of our method is driven by its dynamic calculation, expansion with the difference between the unrestricted and restricted LM, and as a result, the operating time improvements are greater for better models.