Periodic KV Cache Abstraction for Generalised Reasoning

View PDF file from the paper entitled abstract transforme
PDF HTML (experimental) view
a summary:Despite its impressive capabilities, large language models are struggling with generalization that exceeds their training distribution, and it often shows the fulfillment of an advanced pattern instead of real abstract thinking (induction). In this work, we deal with this restriction through the bottleneck lens (IB), which assumes that the typical generalization highlights the ideal balance between the pressure of the inputs and the retaining the predictive information in the underlying representations. We prove the use of IB theory that only Decoder transformers are bound by their ability to form sequence representations for the optimal task. Then we use this result to demonstrate that the periodic global transformation of the internal representations at the level of the sequence (KV is an essential arithmetic step to improve the generalization of transformers in thinking tasks. Based on these theoretical visions, we suggest modifying the transformer structure, in the form of an additional unit that rewrings the KV cache on the world at periodic periods, turning their capacity away from preserving the input hollows and sweeping more useful coding features to predict future symbols. Our model provides great gains on mathematical thinking standards, as it outperforms both vanilla transformers with up to 3.5X more parameters, as well as pruning mechanisms that depend on inference to pressure cache. Our approach can be considered as a preliminary generalization of current KV pain methods; While these methods focus only on the pressure of input representations, they often do so at the expense of keeping predictive information, and therefore their capabilities are limited by nature through an unrestricted model. This defines an initial working framework for processing transformers using information theory, processing basic thinking restrictions that cannot be overcome alone.
The application date
From: Adnan Omerje [view email]
[v1]
Thursday, 22 May 2025 17:33:49 UTC (2,590 KB)
[v2]
Thursday, 5 June 2025 13:38:34 UTC (2,590 KB)
Don’t miss more hot News like this! AI/" target="_blank" rel="noopener">Click here to discover the latest in AI news!
2025-06-06 04:00:00