Samsung’s tiny AI model beats giant reasoning LLMs

0 3 minutes read

A new paper by an AI researcher at Samsung explains how a small network can outperform massive large language models (LLMs) in complex reasoning.

In the race for AI supremacy, the industry mantra has often been “bigger is better.” Tech giants have spent billions to create models that are larger than ever before, but according to Alexia Jolicoeur Martineau of Samsung SAIL Montréal, a radically different and more efficient path forward is possible with the tiny recursive model (TRM).

Using a model with just 7 million parameters, less than 0.01% of the size of leading MAs, TRM achieves new state-of-the-art results on notoriously challenging benchmarks such as the ARC-AGI IQ test. Samsung’s work challenges the common assumption that sheer scale is the only way to enhance the capabilities of AI models, providing a more sustainable and parameter-efficient alternative.

Overcoming the limits of size

While LLM students have shown incredible ingenuity in creating human-like text, their ability to perform complex, multi-step reasoning can be fragile. Since they generate one symbolic answer after another, one mistake early in the process can derail the entire solution, resulting in an invalid final answer.

Techniques such as chain reasoning, where a model “thinks out loud” to solve a problem, have been developed to mitigate this problem. However, these methods are computationally expensive, often require huge amounts of high-quality inferential data that may not be available, and can still produce flawed logic. Even with these reinforcements, LLM students encounter some puzzles that require perfect logical execution.

Samsung’s work is based on a modern AI model known as the Hierarchical Reasoning Model (HRM). HRM introduced a new method using two small neural networks that repeatedly run over a problem at different frequencies to improve the answer. It showed great promise but was complex, relying on uncertain biological arguments and complex fixed point theories whose application was not foolproof.

Instead of two networks, HRM uses one small network that iteratively refines its internal “heuristics” and proposed “answer.”

The model is given the question, an initial guess of the answer, and a latent inference feature. He first goes through several steps to improve his underlying thinking based on all three inputs. Then, using this improved reasoning, it updates its prediction to the final answer. This entire process can be repeated up to 16 times, allowing the model to gradually correct its errors in a highly efficient parameterization manner.

Contrary to what was expected, the research discovered that a small network with just two layers achieved much better generalization than the four-layer version. This size reduction appears to prevent the model from overfitting; A common problem when training on smaller, specialized datasets.

TRM also dispenses with the complex mathematical justifications used by its predecessor. The original HRM model requires the assumption that its functions converge at a fixed point to justify its training method. TRM completely bypasses this by back-propagating through the full iteration process. This change alone provided a huge boost in performance, improving accuracy in the Sudoku-Extreme benchmark from 56.5% to 87.4% in the ablation study.

Samsung’s model breaks AI standards with fewer resources

The results speak for themselves. On the Sudoku-Extreme dataset, which uses only 1,000 training examples, TRM achieves a test accuracy of 87.4%, a huge jump from HRM’s test accuracy of 55%. On the Maze-Hard task, a task that involves finding long paths through a 30×30 maze, HRM scored 85.3% compared to 74.5% for HRM.

In particular, TRM is making huge strides in Abstraction and Inference Group (ARC-AGI), a standard designed to measure true fluid intelligence in artificial intelligence. With only 7 million parameters, TRM achieves an accuracy of 44.6% on ARC-AGI-1 and 7.8% on ARC-AGI-2. This beats HR, which uses a 27 million parameter model, and beats even many of the world’s largest MBAs. For comparison, the Gemini 2.5 Pro scored just 4.9% on the ARC-AGI-2.

The TRM training process has also become more efficient. An adaptation mechanism called ACT – which determines when the model has improved the answer enough and can move on to a new data sample – has been simplified to remove the need for a second, costly forward pass through the network during each training step. This change was made with no significant difference in the final generalization.

This research from Samsung makes a compelling case against the current trajectory of expanding AI models. It shows that by designing architectures that can iteratively reason and self-correct, it is possible to solve very difficult problems using a fraction of computational resources.

See also: Google’s new AI agent rewrites code to automate vulnerability fixes

Want to learn more about AI and Big Data from industry leaders? Check out the Artificial Intelligence and Big Data Expo taking place in Amsterdam, California and London. This comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security Expo, click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

Don’t miss more hot News like this! Click here to discover the latest in AI news!

2025-10-08 11:55:00

0 3 minutes read