Scalable monocular 3D Object detector Without human Annotations

View the PDF file from the paper entitled monosowa: The Revelation
PDF HTML (experimental) view
a summary:The 3D conclusion of the position and direction of one RGB camera is a founding task in seeing a computer with many important applications. Traditionally, the methods of detecting 3D objects are trained in a full -time preparation, which requires large quantities of human illustrative comments, which are arduous and expensive, and do not expand well with the increasing quantities of data captured.
We offer a new way to train 3D objects from one RGB camera without human comments for the field, making orders for more data available for training. The method uses the newly proposed local object movement model to separate the object of the object between the subsequent tires, and it is about 700 times from the previous work and compensates for the difference in the cubic length of the curtain to collect multiple data groups.
The method is evaluated on three general data collections, as although no human signs are used, it surpasses pre -work with a large margin. It also shows its diversity as a fully pre -training training tool, and it shows that combining false factors from multiple data sets can achieve a similar accuracy of the use of human stickers from one data set. The source and model code will be published soon.
The application date
From: Jean Skvrna [view email]
[v1]
Thursday, 16 January 2025 11:35:22 UTC (13,543 KB)
[v2]
MON, 10 Mar 2025 12:27:10 UTC (46,739 KB)
2025-03-11 04:00:00