[2404.06479] Visually Descriptive Language Model for Vector Graphics Reasoning

View the PDF file for the paper entitled a visual and descriptive language model for the fighter graphics, by Zhenhailong Wang and 6 other authors
PDF HTML (experimental) view
a summary:Despite the great developments, large multimedia models (LMMS) are still struggling to fill the gap between low-level visual perception-with focus on shapes, sizes, layouts-and high-level linguistic thinking, such as connotations and logic. This restriction is evident in tasks that require an accurate visual perception, such as comparing engineering properties or solving visual thinking problems. To study this failure mode, we focus on veil drawings-pictures consisting of two-dimensional creatures and shapes, prevailing in LMM tasks in web, design and OS environments. We define two major research questions: How can we enable the careful visual perception, and how can we facilitate high -level thinking based on these low -level perceptions? To capture accurate visual details, we use developed vector graphics (SVG) for accurate coding of visual scenes. However, SVGS cannot be easily explained by LMMS in a zero way. To address this, we suggest the visual linguistic language model (VDLM), which provides a primitive visual description (PVD) as an intermediate text. PVD translates SVGS into a based on a text consisting of primitive features (for example, shape, position, measurement) and their corresponding values. PVD can be learned to use manufacturing data for the task, and represents international visual primitives through veil graphics. This abstraction is more organized, which allows the direct interpretation by the basis models for the generalization of zero. Without a human announced data, experimental results show that VDLM greatly improves the latest LMMS like GPT-4O on various multimedia cognition tasks and thinking tasks. VDLM extensive analyzes show the improving of the interpretation due to its separate perception and thinking. We also show a positive relationship between PVD quality and mission performance. Project page: URL https
The application date
From: Zhenhailong Wang [view email]
[v1]
Tuesday, 9 April 2024 17:30:18 UTC (2,755 KB)
[v2]
Wed, 10 April 2024 02:12:27 UTC (2,755 KB)
[v3]
Friday, May 24, 2024 19:40:26 UTC (2,755 KB)
[v4]
Thursday, Oct 3 2024 21:59:32 UTC (3,770 KB)
[v5]
Thursday, 12 June 2025 17:46:36 UTC (1,785 KB)
Don’t miss more hot News like this! AI/" target="_blank" rel="noopener">Click here to discover the latest in AI news!
2025-06-13 04:00:00