RoboCat: A self-improving robotic agent

research
The new foundation agent learns to run different automatic arms, solve the tasks of at least 100 shows, and improves self -created data.
Robots soon become part of our daily life, but they are often programmed only to perform well -specific tasks. While harnessing recent developments in artificial intelligence can lead to robots that can help in several other ways, progress in building robots for public purposes is partially slowed due to the time required to collect training data in the real world.
The latest paper of artificial intelligence that provides autonomous robots, Robocat, which learns to perform a variety of tasks through various weapons, then new training data is directed to improve their style.
Previous research has explored how to develop robots that can learn multiple tasks on a large scale and combine understanding language models and abilities in the real world of the auxiliary robot. Robocat is the first to solve and adapt to multiple tasks and do this via different real robots.
Robock learns much faster than other modern models. It can pick up a new task with at least 100 symptoms because it derives from a large and varied data collection. This will help accelerate robotics research, as it reduces the need for training to supervise human beings, which is an important step towards creating a robot for general purposes.
How to improve the same robocs
ROBOCAT depends on our GATO model (Spanish CAT “), which can process language, images and procedures in both simulator and material environments. We have collected GATO structure with a large training data collection of pictures sequence and various robot weapons that solve hundreds of different tasks.
After this first round of training, we launched Robocat in a “self -improvement” training course with a set of previously invisible tasks. Learning for each new task follows five steps:
- Collect 100-1000 shows for a new mission or a robot, using a human arm control.
- Roboks refine it on this new task/arm, which leads to the creation of a specialized accidental agent.
- The accidental agent is practiced in this task/new arm at a rate of 10,000 times, generating more training data.
- Include the autonomy and self -created data into the current training data set from Robocat.
- Training a new version of Robocat on the new training data set.
Robokat training course, supported by its ability to generate additional training data independently.
The mixture of all this training means that the latest Robocat depends on a set of data of millions of tracks, from both real automatic weapons and simulation, including self -created data. We used four different types of robots and many automatic weapons to collect vision -based data that represent the tasks that will be trained on Robocat on performance.
Robocat learn from a variety of types of training and tasks: videos of a real robotic arm capture, and the stacking blocks simulator and robochets using an automatic arm arm.
Learn to run new automated weapons and solve the most complex tasks
With the diverse Robocat training, I learned to run different automatic arms within a few hours. Although he was trained on the arms with two fist, it was able to adapt to a more complex arm with three fingers and the number of controlled inputs.
Leave: I learned a new robot arm, robocs of control
right: Video from Robocat using the arm to capture preparation
After monitoring 1000 human -controlled demonstrations, collected in just hours, Robocat can direct this new arm to capture the success of 86 % of the time. With the same level of demonstrations, it can adapt to tasks that combine accuracy and understanding, such as removing the correct fruit from a bowl and solving the matching of the shape, which is necessary for the most complex control.
Examples of tasks can adapt Robocat to the solution after 500-1000 demonstrations.
General expert in self -improvement
Robocat has a virtuous training episode: the more new tasks you learn, the better in learning additional new tasks. The initial version of the Robocat 36 % of only the time in the previously invisible tasks, after learning from 500 offers for each task. But the latest Robocat, who trained on a greater diversity of tasks, doubled more than the success rate of the same tasks.
The big difference in performance between the initial Robocat (one round of training) compared to the final version (wide and varied training, including self -improvement) after the refinement of both versions on 500 offers from previously invisible tasks.
These improvements were the result of the widening scope of robochets, similar to how people are developing a more diverse set of skills because they deepened their learning in a specific field. Robokat’s ability to learn skills independently and quickly, especially when applied to various automated devices, will help pave the way towards a new generation of more useful mechanical factors and general purposes.
2023-06-20 00:00:00