New model design could fix high enterprise AI costs
Enterprise leaders struggling with the high costs of deploying AI models can get a reprieve thanks to a new architectural design.
While the capabilities of generative AI are attractive, its enormous computational requirements for both training and inference lead to significant overhead and increased environmental concerns. At the heart of this inefficiency lies the models’ “fundamental bottleneck”: an autoregressive process that generates text sequentially, symbol by symbol.
For organizations processing massive streams of data, from IoT networks to financial markets, this limitation makes creating long-term analysis slow and economically difficult. However, a new research paper from Tencent AI and Tsinghua University suggests an alternative.
A new approach to AI efficiency
The paper presents continuous autoregressive language models (CALM). This method re-engineers the generation process to predict a continuous vector instead of a discrete token.
High-precision auto encoder “compression[es] A chunk of K symbols in a single continuous vector, which carries a much higher semantic bandwidth.
Instead of processing something like “the”, “cat”, “sat” in three steps, the model compresses them into one step. This design directly “reduces the number of generative steps” and attacks the computational overhead.
Experimental results demonstrate a better trade-off between performance and computation. The CALM AI model combining four codes delivered performance “comparable to robust separate baselines, but at a much lower computational cost” for the organization.
For example, one CALM model requires 44 percent fewer training FLOPs and 34 percent fewer inferential FLOPs than a base switch with similar capability. This indicates savings in both the initial capital expenditure of training and the recurring operational expenditure of inference.
Refactoring the toolkit for the persistent domain
The move from a finite, discrete vocabulary to an infinite, continuous vector space breaks the standard LLM toolkit. The researchers had to develop a “comprehensive framework free of possibilities” to make the new model workable.
For training, the model cannot use standard softmax layer or maximum likelihood estimation. To solve this problem, the team used a “probability-free” objective with the transducer, which rewards the model for accurate predictions without accounting for explicit probabilities.
This new training method also requires a new evaluation scale. Standard criteria such as perplexity are not applicable because they rely on the same probabilities that the model no longer accounts for.
The team proposed using BrierLM, a new metric based on the Brier score that can be estimated from representative samples only. Validation confirmed the BrierLM as a reliable alternative, showing a “Spearman rank correlation of -0.991” with traditional loss measures.
Finally, the framework restores the controlled generation process, an essential feature for enterprise use. Standard temperature sampling is impossible without a probability distribution. This paper presents a “new probability-free sampling algorithm”, including a practical approximation method for managing the trade-off between output accuracy and diversity.
Reducing the costs of artificial intelligence in the organization
This research offers a glimpse into a future where generative AI is defined not only by ever-increasing numbers of parameters, but by architectural efficiency.
The current trajectory of expansion models is hitting a wall of diminishing returns and rising costs. The CALM framework creates “a new design focus for scaling LLM: increasing the semantic bandwidth of each generative step.”
Although this is a research framework and not an off-the-shelf product, it points to a robust and scalable path toward highly efficient language models. When evaluating vendor roadmaps, technology leaders must look beyond model size and start asking about architectural efficiency.
The ability to reduce FLOPs per token generated will become a definite competitive advantage, enabling AI to be deployed more economically and sustainably across the enterprise to reduce costs – from the data center to data-intensive edge applications.
See also: Flawed AI standards put enterprise budgets at risk
Want to learn more about AI and Big Data from industry leaders? Check out the Artificial Intelligence and Big Data Expo taking place in Amsterdam, California and London. This comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security Expo, click here for more information.
AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.
Don’t miss more hot News like this! Click here to discover the latest in AI news!
2025-11-05 16:20:00


