Silicon Valley bets big on ‘environments’ to train AI agents

0 7 minutes read

For years, senior executives of major technology distributed visions of artificial intelligence agents who can independently use software applications to complete the tasks of people. But take consumer agents today to spin, whether it is the Chatgpt agent from Openai or the guilty of confusion, and will quickly realize how limited technology. Making artificial intelligence agents more powerful may take a new set of technologies that the industry still discovers.

One of these technologies is carefully simulating work spaces where agents can be trained at multi-step tasks known as reinforcement learning environments (RL). Similar to how to run data collections called the last wave of artificial intelligence, RL environments began to look like a decisive element in delegation of agents.

Artificial intelligence researchers, founders, and investors tell that Techcrunch are now asking for more RL environments, and there is no shortage of startups in the hope of providing them.

“All large artificial intelligence laboratories build RL at home,” Jennifer Lee, General Press at Andressen Horowitz, said in an interview with Techcrunch. “But as you can imagine, creating these data collections is very complicated, so artificial intelligence laboratories also look at the external sellers that can create high -quality environments and assessments. Everyone looks at this space.”

The RL batch assigned a new class of startups that finance it well, such as Meanyize and Prime Intellect, which aims to lead this space. Meanwhile, large data brand companies such as MERCOR and Truge say they invest more in RL environments to keep pace with industry shifts from fixed data groups to interactive simulations. The major investment laboratories are also very considering: according to information, Anthropor leaders discussed the spending of more than a billion dollars on RL environments next year.

Hope for investors and founders is that one of these startups appears to be the “Amnesty International Miter scale”, in reference to the power of setting up data signs of $ 29 billion that operate the Chatbot era.

The question is whether the RL environments will really pay the limits of the progress of artificial intelligence.

TECHRUNCH event

San Francisco
|
27-29 October, 2025

What is the RL environment?

In essence, RL environments are training training reasons for simulating what artificial intelligence agent will do in applying a real program. One of the founders described their construction in the last interview “Like creating a very boring video game.”

For example, the environment can simulate the Chrome browser and the mission of the artificial intelligence agent with the purchase of a pair of sinks on Amazon. The agent was ranked on his performance and sent a reward signal when he succeeds (in this case, buy a pair of worthy socks).

Although this task seems relatively simple, there are many places where artificial intelligence agent can stumble. You may be lost in moving in lists below the web page, or buying a lot of socks. Since developers cannot predict exactly the wrong shift that an agent will take, the environment itself must be strong enough to capture any unexpected behavior, and still provide useful comments. This makes building environments more complicated than a fixed data collection.

Some environments are completely complicated, allowing artificial intelligence agents to use tools, access the Internet or use different software applications to complete a specific task. Others are more narrow, aiming to help the agent learn specific tasks in the applications of the institution’s programs.

Although the RL environments are the hot thing in the silicon valley at the present time, there is a lot of precedents to use this technique. One of Openai’s first projects in 2016 was the construction of “RL Gyms”, which was quite similar to the modern concept of environments. In the same year, Google DeepMind defeated the world champion in the game, Go. He also used RL techniques inside a simulator.

What is unique in today’s environments is that researchers are trying to build artificial intelligence agents who use the computer with large transformer models. Unlike alphago, which was a specialist AI system working in closed environments, artificial intelligence agents are trained today at more general capabilities. Today’s artificial intelligence researchers have a stronger starting point, but also a complex goal where more can make more.

Crowded

Artificial intelligence data classification companies such as Scale Ai, Truplge and Mercor are trying to meet this moment and build RL environments. These companies have more resources than many startups in space, as well as deep relationships with artificial intelligence laboratories.

Edwin Chen, CEO of Durg, tells, Techcrunch, has recently seen a “significant increase” in demand for RL environments within AI laboratories. He said that TURGE – which achieved $ 1.2 billion in revenues last year from working with AI laboratories such as Openai, Google and Meta – recently installed a new internal institution that is specifically expensive to build RL environments.

CLOSE Behind Durg is Mercor, a $ 10 billion emerging company, which also worked with Openai, Meta and Anthropic. Mercor prepares investors in their RL Building environments for specific tasks of the field such as coding, health care and law, according to the marketing materials that Techcrunch sees.

“Few understand the opportunity about RL environments really”.

Scale AI used to control the area of signs of data, but it has lost a floor since Meta has invested $ 14 billion and rented its CEO. Since then, Google and Openai Scale AI have been dropped as a data provider, and have been running the start of competition for the signs of data on the data inside Meta. But still, Scale tries to meet the moment and build environments.

“This is just the nature of the work [Scale AI] “We did it in the first days of independent vehicles, the unit of our first business. When Chatgpt came out, expand the AI range with that. Now, again, we adapt to new border spaces such as agents and environments,” Qaitan Ran, head of Scale AI products told agents and RL environments.

Some new players focus exclusively on environments from the start. Among them is a mechanic, a startup was established almost six months ago with the bold goal of “automating all jobs”. However, co -founder Matthew Barent tells that his company begins with RL environments for artificial intelligence coding agents.

Mechanics aims to provide artificial intelligence laboratories with a small number of powerful RL environments, says Barnett, instead of large data companies that create a wide range of simple RL environments. To this point, the startup of software engineers offers $ 500,000 to build RL – much higher contractors that can earn work on a scale or increase.

A mechanic has already worked with Antarbur on RL environments, two sources knowing this Techcrunch issue. The mechanism and the Anthropor refused to comment on the partnership.

Other startups are betting that RL environments will be influential outside the artificial intelligence laboratories. Prime Intellect targets – a start -up company supported by Ai Andrej Karpathy, Founders Fund and Menlo Ventures – smaller developers with his RL environments.

Last month, Prime IntelleCt launched a RL Environments center, which aims to be a “hostile face for the RL environments”. The idea is to give open source developers to access the same resources that large artificial intelligence laboratories possess, and the sale of these developers to reach the arithmetic resources in this process.

Training can be generally capable of RL environments more expensive than previous training techniques on artificial intelligence, according to Prime Intellect Will Brown. Besides startups, building RL environments, there is another opportunity for GPU service providers that can run the process.

“The RL environments will be very large so that no single company will dominate,” Brown said in an interview. “Part of what we do is just trying to build a good open -source infrastructure around it. The service we sell is an account, so it is a comfortable shock to use graphics processing units, but we think about it in the long run.”

Will it expand?

The open question about RL environments is whether this technique will work like previous training methods on artificial intelligence.

Learning to reinforce some of the largest jumps in artificial intelligence over the past year, including models such as Openai’s O1 and Claude OPUS 4.

Environments are part of the AI Labs’s larger bet on RL, which many believe will continue to make progress because they add more data and arithmetic resources to this process. Some researchers who were behind the O1 Techcrunch previously told the company that the company originally invested in the thinking forms of artificial intelligence-which was created through investments in RL and calculating the test time-because they believed it would expand well.

The best way to expand the RL range is still unclear, but the environments seem to be a promising competitor. Instead of just a Chatbots reward for text responses, they allow agents to work in simulations with tools and computers available to them. This is more intense in resources, but it is likely to be more feasible.

Some are skeptical that all these RL environments will come out. Ross Taylor, a former Amnesty International research with Mita, which has been established in its establishment, tells Techcrunch that RL environments are vulnerable to piracy rewards. This is a process in which artificial intelligence models are cheated in order to obtain a reward, without really doing the task.

Taylor said: “I think people are less difficult to expand environments,” Taylor said. “Even the best is available to the public [RL environments] Usually it does not work without dangerous modification. “

His Sherwin Wu, head of engineering at Openai, said in Bodcast recently that he was “short” at RL Environment Startups. Wu noted it is a very competitive space, but also that artificial intelligence research is rapidly developing so that it is difficult to present Amnesty International laboratories well.

Karpathy, an investor in Prime Intellect who described RL environments as a possible penetration, has also expressed caution of RL on a wider scale. In a post on X, fears about the amount of artificial intelligence can be pressed from RL.

“I am up in environments and reactions, but I am appropriate for learning to reinforcement specifically,” said Carbashi.

Complementing: A previous version of this article is indicated by automated work mechanics. It was updated to reflect the official name of the company.

Don’t miss more hot News like this! Click here to discover the latest in Technology news!

2025-09-21 19:22:00

0 7 minutes read