AI

[2509.06945] Interleaving Reasoning for Better Text-to-Image Generation

Authors:WENXUAN HUANG, Shuang Chen, Zheyong XIE, Shaosheng Cao, Shixiang Tang, Yufan Shen, Qingyu Yin, Wenbo Hu, Xiaoman Wang, Yuntian Tang, Junbo Qiao, Yue Guo, Yao Hu, Zheenfei Yin, Philip Torr,

View the PDF file from the paper entitled Interleaving thinking to generate the best text to an image, by Wenxuan Huang and 17 other authors

PDF HTML (experimental) view

a summary:The unified multi-generation models have recently achieved a great improvement in the possibility of generating images, however the big gap in the following instructions remains and maintains details compared to the systems you face tightly with generation such as GPT-4O. Motivated by modern developments in interlocking thinking, we explore whether this logic can increase the improvement of the text generation to the image (T2I). We offer the generation of overlapping thinking (IRG), a frame that alternates between half -thinking and synthesis of images: The model first produces a document based on the text to direct a preliminary image, then reflects the result of improving fine details, visual quality and beauty while preserving indications. To effectively train IRG, we suggest teaching intertwined thinking (IRGL), which targets two sub -races: (1) Promoting the stage of thinking and primary eyes to create basic content and base quality, and (2) enable high -quality textual reflection and the insured implementation of these paint in one form. We care for the IRGL-300K, which is a collection of data organized in six educational situations that jointly cover the thinking in text-based learning, and full thinking paths. Starting with the unified foundation model that originally emanates from the outputs of the interlocking text, our training in two phases first builds strong thinking and reflection, and then disturbs the IRG pipeline efficiently in the full image path data. Wide experiences show the performance of Sota, as they have made absolute gains from 5 to 10 points on Geneval, Wise, Tiif, Genai-Bench and Oneig-Een, as well as significant improvements in visual quality and delicate loyalty. The symbol, typical weights and data groups will be released in: URL https.

The application date

From: Wenxuan Huang [view email]
[v1]

Monday, 8 Sep 2025 17:56:23 UTC (45,023 KB)
[v2]

Tuesday, 9 Sep 2025 10:50:30 UTC (45,023 KB)

Don’t miss more hot News like this! AI/" target="_blank" rel="noopener">Click here to discover the latest in AI news!

2025-09-09 04:00:00

Related Articles

Back to top button