Technology

Alibaba’s ‘ZeroSearch’ lets AI learn to google itself — slashing training costs by 88 percent


Join daily and weekly newsletters to obtain the latest updates and exclusive content to cover the leading artificial intelligence in the industry. Learn more


The researchers at the Alibaba Group collection have developed a new approach that can significantly reduce the cost and complexity of AI systems to search for information, eliminating the need for expensive commercial search engine programming facades.

This technology, called “Zerosearch”, allows large language models (LLMS) to develop advanced research capabilities through simulation approach rather than interacting with real search engines during the training process. This innovation can provide companies with great API expenses while providing better control of how to learn artificial intelligence systems to recover information.

“Learning to reinforcement [RL] Training requires frequent procedures, which may include hundreds of thousands of search requests, which bear large expenses from the application programming interface and restrict the ability to expand strongly, “researchers write in their paper published on ARXIV this week.

How Zerosearch Ai is trained to search without search engines

Zerosearch is important. Companies that develop artificial intelligence assistants who can search independently about information are facing two main challenges: unexpected quality of documents that are returned by search engines during training, and prohibited high costs to take hundreds of thousands of API calls to commercial search engines such as Google.

Alibaba approach begins with a lightweight lightweight polishing to convert LLM into a retrieval unit capable of generating both relevant and relevant documents in response to inquiries. During reinforcement learning training, the system uses what researchers call a “start -up strategy” that gradually destroying the quality of the documents created.

“Our main vision is that LLMS has gained wide global knowledge during large -scale training and is able to generate relevant documents in view of research inquiries,” the researchers explained. “The main difference between the real search engine and the LLM simulation lies in the text style of the content that has been returned.”

Google surpasses a small part of the cost

In comprehensive experiments across seven data collections to leave questions, Zerosearch not only coincides, but often exceeded the performance of trained models with real search engines. Miscellaneous, the teacher’s 7B retrieval unit has achieved a comparative performance from Google’s research, while the 14B parameter unit excels it.

Significant cost savings. According to researchers analyzing, training with nearly 64,000 search inquiries using Google Search via Serpapi will cost about $ 586.70, while using 14b LLM parameter simulation on four A100 GPU costs only $ 70.80-by 88 %.

“This indicates the feasibility of using LLM a good trained LLM as a substitute for real search engines in reinforcement learning settings,” the paper notes.

What does this mean for the future of developing artificial intelligence

This penetration is a major shift in how to train artificial intelligence systems. Zerosearch explains that artificial intelligence can improve without relying on external tools such as search engines.

The impact can be great for the artificial intelligence industry. So far, artificial intelligence systems training often requires expensive API calls for services controlled by large technology companies. Zerosearch changes this equation by allowing Amnesty International to simulate research instead of using actual search engines.

For smaller artificial intelligence companies and startups with limited budgets, this approach may raise the stadium level. The high costs of API calls were a great obstacle to entering the development of advanced artificial intelligence assistants. By reducing these costs by about 90 %, Zerosearch makes advanced AI training easier.

Besides cost savings, this technique allows developers more control of the training process. When using real search engines, the quality of the documents that have been returned cannot be predicted. By searching the emulator, developers can accurately control the information that artificial intelligence sees during training.

This technology works through multiple typical families, including QWEN-2.5 and Llama-3.2, and with both the variables of base and instructions. The researchers created their symbols, data groups and models that were pre -trained on GitHub faced face, allowing researchers and other companies to implement this approach.

With the continued development of large linguistic models, techniques such as Zerosearch suggest in the future as artificial intelligence systems can develop increasingly advanced capabilities through self-simulation rather than relying on external services-which may change the economies of artificial intelligence development and reduce dependencies on large technology platforms.

The paradox is clear: in teaching artificial intelligence to search without search engines, alibaba may have created a technique that makes traditional search engines less necessary to develop artificial intelligence. Since these systems have become more self -sufficient, the technology scene may seem completely different in just a few years.


Don’t miss more hot News like this! Click here to discover the latest in Technology news!


2025-05-08 19:15:00

Related Articles

Back to top button