AI

Yann LeCun Team’s New Research: Revolutionizing Visual Navigation with Navigation World Models

Mobility is an essential skill for any visually capable being, and it is a critical tool for survival. The agents enable the resources to find the shelter and avoid threats. In humans, navigation often involves mental simulation of possible future paths with alternative restrictions and capabilities. However, modern automatic navigation systems are much less flexible. Modern mobility policies are usually “symbolized”, which means that once the training is completed, presenting new restrictions is difficult. Moreover, the optical navigation models subject to the current supervision of allocating additional arithmetic resources are combined when facing the most complex mobility tasks.

To address the above issues, in a new paper Mobility world modelsThe Meta, New York University and Birkli Research team suggests a global navigation model (NWM), a model that can control video generation to predict future visual notes based on previous notes and navigation procedures. This model enables agents to simulate potential navigation plans and evaluate its feasibility before taking action.

NWM is trained using a large collection of video clips and mobility procedures collected from various automated factors. The model learns to predict the future representation of video tires, given the representation of previous frameworks and corresponding mobility procedures. After training, NWM can plan the mobility paths in new environments by simulating potential tracks and verifying whether to lead to the target destination.

Concilious, NWM is inspired by the global models based on modern spreading, such as diamond and GamengenWhich is used to learn the model -based reinforcement. However, unlike these models, NWM is trained in a wide range of environments and agents. By taking advantage of this variety of data, the researchers have successfully training Proliferation adapter form It can be circulated through multiple environments. This generalization is a great exit from previous models, often restricted by specific environments or tasks.

NWM also shares conceptual similarities with New offer synthesis (NVS) Methods like Nerf and GDC. However, while NVS styles aim to rebuild 3D scenes of two -dimensional images, NWM goal is more ambitious: it seeks to train one model capable of moving across various environments. Unlike NVS approaches, NWM does not rely on 3D devices but instead models of time dynamics directly from natural video data.

The main art component in NWM is Police spread transformer (CDIT)Who predicts the following visual condition that gave cases of previous image and procedures as inputs. Unlike the standard DIOT (DIT)CDIT provides a much better mathematical efficiency. Its complexity is in line with the number of context tires, allowing it to deal with the largest models with up to up to up to up to 1 billion teacher Through various environments and agent embodiment. This efficiency allows CDIT to require Four times less than fluctuations From the standard dit, all with the presentation of superior prediction results.

The research team conducted extensive experiences to verify NWM capabilities. One of the prominent experiences involved in the use of NWM in unfamiliar environments, where it benefited from the non -announced video data training, free from procedures and rewards free from Ego4D Data set. Quantitatively, NWM showed improved video prediction and generating individual images. In the quantity, it has achieved more accurate future predictions Stanford Go Dataset When you are trained with additional video data that is not named. These results shed light on NWM’s ability to generalize effectively across invisible environments, which is a major advantage of the tasks of mobility in the real world.

In short, and Global Navigation Model (NWM) It represents a strong jump forward for automated mobility. Its ability to simulate new restrictions, planning and adapting to it makes it a promising approach to building more independent and flexible systems.

The project page is available here. Paper Mobility world models It is on Arxiv.


author: HECate is editor: Chang series


2024-12-09 21:01:00

Related Articles

Back to top button