AI Developers Look Beyond Chain-of-Thought Prompting

0 6 minutes read

Since Openai’s launch of Chatgpt in 2022, artificial intelligence companies have been imprisoned in a race to build giant models in an increasingly, causing companies invested huge sums in construction Data centers. But near the end of last year, There were grumbling The modular scaling benefits were hitting the wall. the Scary performance One of the largest model in Openai, GPT-4.5 gave greater weight to the idea.

This position leads to a shift in focus, as researchers aim to make machines “think” more like humans. Instead of building larger models, researchers now give them more time to think through problems. In 2023, a Google team presented the COT Series (COT), where LLMS models operate with a step -by -step problem.

This approach supports the impressive capabilities of a new generation of thinking models such as Openai’s O3, Google’s Gemini 2.5, Claude 3.7 and Deepseek’s R1. Artificial intelligence papers are now mired in references to “thinking”, “thinking” and “thinking”, as the number of techniques inspired by perception multiplies.

“Since about the spring last year, it has been clear to anyone serious about artificial intelligence research that the next revolution will not be around the scope,” says Igor Grossman, Professor of Psychology at the University of Waterloo, Canada. “The next revolution will be about better awareness.”

How do the logic of artificial intelligence work

In essence, LLMS uses statistical possibilities to predict the following symbol – the artistic name to cut the text from the texts that you work with – in a series of text. But COT technology showed that simply pushing models to respond with a series of intermediate “thinking” steps before reaching an answer that greatly strengthened the performance of mathematics and logic problems.

“It was a surprise that she worked very well,” says Kanishk Gandhi, a student science student at Stanford University. Since then, the researchers have invented a group of the accessories of this technology, including “Tree of Thought”, “The Plan of Thought”, “The Logic of Thought”, and “Repeated Thought”, among other things.

Pioneer models also used learning to reinforce to bake this technology in their models, by obtaining a basic model for producing COT responses and then a reward that leads to the best final answers. In this process, models have developed a variety of cognitive strategies that reflect how humans solve complex problems, says Gandhi, such as dividing them into simpler tasks and declining to correct errors in previous thinking steps.

But the way these models are trained can lead to problems, Michael Skson, a graduate student at the University of California, Santa Barbara says. Learning reinforcement requires a way to verify whether the response is correct to determine whether a bonus will be given. This means that thinking models have been mainly trained in the tasks in which this verification is easy, such as mathematics, coding or logical puzzles. As a result, they tend to address all questions as if they were complicated thinking problems, which can lead to excessive thinking, says Sexon.

In a modern experience described in pre -printing paper, he and his colleagues gave different models of artificial intelligence a series of deliberate easy tasks, and showed that thinking models use more distinctive symbols to reach a correct answer from the traditional LLMS. In some cases, this thinking has led the worst performance. Interestingly, Sexon says that dealing with models in the same way that you deal with a person with excessive thinking has proven effective. The researchers obtained the model to estimate the number of distinctive symbols that will take it to solve the problem, then gave it regular updates during the thinking process about the number of leaving it before you need to provide an answer.

“This was a repeated lesson,” says Sixon. “Although models do not behave like humans in many important ways, methods inspired by our awareness may be amazingly effective.”

Where the logic of artificial intelligence fails

There are still important gaps in the capabilities of thinking about these models. Martha Lewis, a professor of artificial intelligence at the University of Amsterdam, recently compared the ability of LLMS and humans to think through the use of analogies, which is believed to be the basis for creative thinking.

When tested on standard versions of analog thinking tests, the performance of both the model and humans was good. But when they were given new variables of tests, the stereotype that the nose puts compared to humans. Lewis says that the potential explanation is that the problems similar to the standard versions of these tests were in models training data and that they simply use the shallow patterns to find solutions instead of thinking. The tests were conducted on the oldest GPT-3, GPT-3.5, and GPT-4, and Lewis says it is possible that newer thinking models will improve. But experiences prove the need to be careful when talking about the cognitive capabilities of Amnesty International.

“Since the models are very fluent, it is very easy to feel as if they were doing something more than they can already,” says Lewis. “I don’t think we should say that these models are thinking without testing what we already mean by thinking about a specific context.”

Another important field where the capabilities of thinking about artificial intelligence may be incomplete is the ability to think about the mental situations of others, which is known as the theory of reason. Many papers have proven that LLMS can solve the classic psychological tests of this ability, but researchers at the AII2 Institute (AI2) may suspect that this ideal performance may be due to the inclusion of tests in training data sets.

Therefore, the researchers created a new set of mind theory tests based on the positions of the real world, which measured the ability of the model separately to infer a person’s mental state, predicting how this condition affects their behavior, and judging whether their actions are reasonable. For example, the model may be told that someone picks up a closed package of chips in the supermarket, but the contents are unemployed. Then he is asked if the person knows that the chips are rotten, whether he will buy chips, and whether that will be reasonable.

The team found that although the models were good at predicting mental states, they were bad in predicting behavior and judging reasonable. AI2 research scientist Ronan Lu Braz is suspected that this is that models calculate the possibility of procedures based on all the data available to them and they know, for example, that it is unlikely that someone will buy moldy chips. Although they can conclude a person’s mental state, they do not seem to take this condition into consideration when they predict their behavior.

However, the researchers found that reminding the prediction models of the mental situation, or granting them a specific mentor of their experience telling them to consider the personality and improve a significant performance. Yuling Gu, a preceded young investigator in AI2, says it is important to use the correct pattern models to think of specific problems. “We hope this logic in the future will be more profound in these models,” she says.

Can knowledge improve artificial intelligence?

Gruzman says that obtaining models to the mind flexibly through a wide range of tasks may require a more fundamental shift. Last November, he co -authored a paper with the leadership of artificial intelligence researchers highlighting the need to involve models with beyond knowledge, which they describe as “the ability to think about thinking and organize them.”

Grossman says, the models today are “professional nonsense”, which reach the best guess of any question without the ability to identify or communicate uncertainty. It is also bad in adapting responses to specific contexts or looking at various views, and the things that humans do naturally. Garrosman says that providing models with these types of cognitive abilities will not only improve performance, but will also facilitate their thinking.

He adds that this will be difficult, because it will include either a huge effort to naming training data for things like certainty or importance, or adding new units to models that do things like assessing the confidence of the thinking steps. Thinking models use resources and arithmetic energy much more than standard LLMS, and adding these additional training requirements or processing episodes that are likely to exacerbate the situation. “It can put many small companies from work,” says Grossman. “There are environmental costs that are linked to that as well.”

However, it is still convinced that trying to imitate cognitive processes behind human intelligence is the most obvious path forward, even if most of the efforts today are very simplified. He says, “We do not know an alternative way to think.” “We can only invent things that we have a kind of conceptual understanding.”

From your site articles