Snowflake Proposes ExCoT: A Novel AI Framework that Iteratively Optimizes Open-Source LLMs by Combining CoT Reasoning with off-Policy and on-Policy DPO, Relying Solely on Execution Accuracy as Feedback

Text translation into SQL, the task of converting natural language information into organized SQL phrases, necessary to facilitate easy -to -use database reactions. However, the task includes great complications, especially linking the chart, dealing with the construction of the synthetic SQL sentence, and solving the user infections. While LLMS models have shown strong potential across various fields, the effectiveness of organized thinking techniques such as the idea of the idea (COT) within the text of the text to SQL remains limited. Previous attempts to use a zero bed or improve direct preference (DPO) without thinking, marginal improvements, indicating the necessity of tougher methodologies.
Snowflake Excot provides an organized framework to improve open source LLMS with a mix of thinking in COT and improve repetitive preference, especially using DPO outside politics and specifically controlling the resolution of the resolution. Excot behaves with external reward models and human explanations, instead on the thinking steps created internally and implementation results. The method operates in two main phases: Initially, it creates the COT data that has been validated by DPO outside politics, and it is the basis for the control subject to supervision. After that, the model creates and behaves COT data repeatedly via DPO on the quality, which increases the increasing accuracy through the feedback derived from the validity of the implementation.
Excot employs detailed thinking in COT, especially the adoption of the gap and oppression strategy where complex queries decompose into a simpler subcontinent. All sub -efforts are analyzed and resolved independently before combining a coherent final inquiry. This regulatory decomposition enables the model to manage the complexity and the common intertwined structures in SQL operations more effectively. The verification based on implementation is the basic mechanism for evaluating the right, as the validity of the quotes created by comparing their implementation outputs in exchange for the results of the terrestrial truth is verified. The incorrect and correct information is made systematically, providing explicit signals for preference -based learning. DPO stages gradually promoting quality from the accuracy of thinking in the model.
Excot experimental evaluation showed significant improvements in the accuracy of implementation. Specifically, with the Llama-3.1 70b, the high implementation resolution in the development of birds is set from 57.37 % to 68.51 %, and the performance of the spider test from 78.81 % to 86.59 %. Comparable performance improvements are recorded with QWEN-2.5-Code 32B. These results are placed in their position as a pioneering approach in the assessments of one model of these standards, bypassing specific methods such as XiyansQL and ownership models including Openai variables. It is worth noting that the improvements have constantly maintained high query health rates (which exceed 98 %), which confirms the improvements in semantic righteousness in addition to grammatical accuracy.
In conclusion, Excot represents a systematic progress in improving the regulatory logic of the open source LLMS applied to the tasks of the text to SQL. By combining structured COT thinking with improved preferences, only directed by implementing comments, Expervest effectively addresses the limitations specified in previous methods. Its ability to improve repetition ensures continuous improvement without relying on external reward structures or handicrafts. More research may explore the extension of this framework to more complex planning environments and additional organized thinking tasks, thus expanding the ability and reliability of LLMS in the contexts of generating the inquiry.
Payment Paper, GitHub page and details. All the credit for this research goes to researchers in this project. Also, do not hesitate to follow us twitter And do not forget to join 85k+ ml subreddit.
🔥 [Register Now] The virtual Minicon Conference on Open Source Ai: Free Registration + attendance Certificate + 3 hours short (April 12, 9 am- 12 pm Pacific time) [Sponsored]

Asif Razzaq is the CEO of Marktechpost Media Inc .. As a pioneer and vision engineer, ASIF is committed to harnessing the potential of artificial intelligence for social goodness. His last endeavor is to launch the artificial intelligence platform, Marktechpost, which highlights its in -depth coverage of machine learning and deep learning news, which is technically sound and can be easily understood by a wide audience. The platform is proud of more than 2 million monthly views, which shows its popularity among the masses.

2025-04-03 07:38:00