Large language models to accelerate organic chemistry synthesis

Mendoza, A., Ishihara, Y. & Baran, P. S. Scalable enantioselective total synthesis of taxanes. Nat. Chem. 4, 21–25 (2012).
Google Scholar
Elvira, K. S., i Solvas, X. C., Wootton, R. C. & Demello, A. J. The past, present and potential for microfluidic reactor technology in chemical synthesis. Nat. Chem. 5, 905–915 (2013).
Google Scholar
Ball, P. Chemistry: why synthesize? Nature 528, 327–329 (2015).
Google Scholar
Newman-Stonebraker, S. H. et al. Univariate classification of phosphine ligation state and reactivity in cross-coupling catalysis. Science 374, 301–308 (2021).
Google Scholar
Mikulak-Klucznik, B. et al. Computational planning of the synthesis of complex natural products. Nature 588, 83–88 (2020).
Google Scholar
Jablonka, K. M., Schwaller, P., Ortega-Guerrero, A. & Smit, B. Leveraging large language models for predictive chemistry. Nat. Mach. Intell. 6, 161–169 (2024).
Google Scholar
Shen, Y. et al. Automation and computer-assisted planning for chemical synthesis. Nat. Rev. Methods Primers 1, 23 (2021).
Google Scholar
Tao, H. et al. Nanoparticle synthesis assisted by machine learning. Nat. Rev. Mater. 6, 701–716 (2021).
Google Scholar
Merchant, A. et al. Scaling deep learning for materials discovery. Nature 624, 80–85 (2023).
Google Scholar
Burger, B. et al. A mobile robotic chemist. Nature 583, 237–241 (2020).
Google Scholar
Angello, N. H. et al. Closed-loop optimization of general reaction conditions for heteroaryl Suzuki–Miyaura coupling. Science 378, 399–405 (2022).
Google Scholar
Betinol, I. O., Lai, J., Thakur, S. & Reid, J. P. A data-driven workflow for assigning and predicting generality in asymmetric catalysis. J. Am. Chem. Soc. 145, 12870–12883 (2023).
Google Scholar
Rinehart, N. I. et al. A machine-learning tool to predict substrate-adaptive conditions for Pd-catalyzed C–N couplings. Science 381, 965–972 (2023).
Google Scholar
Granda, J. M., Donina, L., Dragone, V., Long, D.-L. & Cronin, L. Controlling an organic synthesis robot with machine learning to search for new reactivity. Nature 559, 377–381 (2018).
Google Scholar
Mehr, S. H. M., Craven, M., Leonov, A. I., Keenan, G. & Cronin, L. A universal system for digitization and automatic execution of the chemical synthesis literature. Science 370, 101–108 (2020).
Google Scholar
Rohrbach, S. et al. Digitization and validation of a chemical synthesis literature database in the ChemPU. Science 377, 172–180 (2022).
Google Scholar
Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering. Science 361, 360–365 (2018).
Google Scholar
Wang, H. et al. Scientific discovery in the age of artificial intelligence. Nature 620, 47–60 (2023).
Google Scholar
Toniato, A., Schwaller, P., Cardinale, A., Geluykens, J. & Laino, T. Unassisted noise reduction of chemical reaction datasets. Nat. Mach. Intell. 3, 485–494 (2021).
Google Scholar
Achiam, J. et al. GPT-4 technical report. Preprint at https://doi.org/10.48550/arXiv.2303.08774 (2023).
Lehr, S. A., Caliskan, A., Liyanage, S. & Banaji, M. R. ChatGPT as research scientist: probing GPT’s capabilities as a research librarian. Proc. Natl Acad. Sci. USA 121, e2404328121 (2024).
Google Scholar
Kang, Y. & Kim, J. ChatMOF: an artificial intelligence system for predicting and generating metal–organic frameworks using large language models. Nat. Commun. 15, 4705 (2024).
Google Scholar
Dagdelen, J. et al. Structured information extraction from scientific text with large language models. Nat. Commun. 15, 1418 (2024).
Google Scholar
Hou, W. & Ji, Z. Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis. Nat. Methods 21, 1462–1465 (2024).
Google Scholar
Zheng, Z. et al. A GPT-4 reticular chemist for guiding MOF discovery. Angew. Chem. Int. Ed. 62, e202311983 (2023).
Google Scholar
Boiko, D. A., MacKnight, R., Kline, B. & Gomes, G. Autonomous chemical research with large language models. Nature 624, 570–578 (2023).
Google Scholar
Canty, R. B. & Abolhasani, M. Reproducibility in automated chemistry laboratories using computer science abstractions. Nat. Synth. 3, 1327–1339 (2024).
Google Scholar
Ruan, Y. et al. An automatic end-to-end chemical synthesis development platform powered by large language models. Nat. Commun. 15, 10160 (2024).
Google Scholar
Zheng, Z. et al. ChatGPT research group for optimizing the crystallinity of MOFs and COFs. ACS Cent. Sci. 9, 2161–2170 (2023).
Google Scholar
Bran, A. M. et al. Augmenting large language models with chemistry tools. Nat. Mach. Intell. 6, 525–535 (2024).
Google Scholar
Zheng, Z., Zhang, O., Borgs, C., Chayes, J. T. & Yaghi, O. M. ChatGPT chemistry assistant for text mining and the prediction of MOF synthesis. J. Am. Chem. Soc. 145, 18048–18062 (2023).
Google Scholar
Antunes, L. M., Butler, K. T. & Grau-Crespo, R. Crystal structure generation with autoregressive large language modeling. Nat. Commun. 15, 10570 (2024).
Google Scholar
Zheng, Z. et al. Integrating machine learning and large language models to advance exploration of electrochemical reactions. Angew. Chem. Int. Ed. 137, e202418074 (2024).
Google Scholar
Ramos, M. C., Collison, C. J. & White, A. D. A review of large language models and autonomous agents in chemistry. Chem. Sci. 16, 2514–2572 (2025).
Google Scholar
Segler, M. H., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).
Google Scholar
Coley, C. W. et al. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science 365, eaax1566 (2019).
Google Scholar
Shields, B. J. et al. Bayesian reaction optimization as a tool for chemical synthesis. Nature 590, 89–96 (2021).
Google Scholar
Tang, T. et al. Interrogating the mechanistic features of Ni (I)-mediated aryl iodide oxidative addition using electroanalytical and statistical modeling techniques. J. Am. Chem. Soc. 145, 8689–8699 (2023).
Google Scholar
Wang, J. Y. et al. Identifying general reaction conditions by bandit optimization. Nature 626, 1025–1033 (2024).
Google Scholar
Raghavan, P. et al. Dataset design for building models of chemical reactivity. ACS Cent. Sci. 9, 2196–2204 (2023).
Google Scholar
Frey, N. C. et al. Neural scaling of deep chemical models. Nat. Mach. Intell. 5, 1297–1305 (2023).
Google Scholar
Kearnes, S. M. et al. The Open Reaction Database. J. Am. Chem. Soc. 143, 18820–18826 (2021).
Google Scholar
Coley, C. W., Rogers, L., Green, W. H. & Jensen, K. F. Computer-assisted retrosynthesis based on molecular similarity. ACS Cent. Sci. 3, 1237–1245 (2017).
Google Scholar
Lowe, D. M. Extraction of Chemical Structures and Reactions from the Literature. PhD thesis, University of Cambridge (2012).
Tu, Z. & Coley, C. W. Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. J. Chem. Inf. Model. 62, 3503–3513 (2022).
Google Scholar
Sacha, M. et al. Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. J. Chem. Inf. Model. 61, 3273–3284 (2021).
Google Scholar
Seo, S.-W. et al. GTA: graph truncated attention for retrosynthesis. In Proc. AAAI Conference on Artificial Intelligence Vol. 35, 531–539 (AAAI Press, 2021).
Somnath, V. R., Bunne, C., Coley, C., Krause, A. & Barzilay, R. Learning graph models for retrosynthesis prediction. Adv. Neural Inf. Process. Syst. 34, 9405–9415 (2021).
Wang, X. et al. RetroPrime: a diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chem. Eng. J. 420, 129845 (2021).
Google Scholar
Wan, Y., Hsieh, C.-Y., Liao, B. & Zhang, S. Retroformer: pushing the limits of end-to-end retrosynthesis transformer. In International Conference on Machine Learning 22475–22490 (PMLR, 2022).
Chen, S. & Jung, Y. Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1, 1612–1620 (2021).
Google Scholar
Yao, L. et al. Node-aligned graph-to-graph: elevating template-free deep learning approaches in single-step retrosynthesis. JACS Au. 4, 992–1003 (2024).
Google Scholar
Coley, C. W., Barzilay, R., Jaakkola, T. S., Green, W. H. & Jensen, K. F. Prediction of organic reaction outcomes using machine learning. ACS Cent. Sci. 3, 434–443 (2017).
Google Scholar
Ahneman, D. T., Estrada, J. G., Lin, S., Dreher, S. D. & Doyle, A. G. Predicting reaction performance in C–N cross-coupling using machine learning. Science 360, 186–190 (2018).
Google Scholar
Li, S.-W., Xu, L.-C., Zhang, C., Zhang, S.-Q. & Hong, X. Reaction performance prediction with an extrapolative and interpretable graph model based on chemical knowledge. Nat. Commun. 14, 3569 (2023).
Google Scholar
Szymanski, N. J. et al. An autonomous laboratory for the accelerated synthesis of novel materials. Nature 624, 86–91 (2023).
Google Scholar
Saebi, M. et al. On the use of real-world datasets for reaction yield prediction. Chem. Sci. 14, 4997–5005 (2023).
Google Scholar
Li, D.-Z. & Gong, X.-Q. Challenges with literature-derived data in machine learning for yield prediction: a case study on Pd-catalyzed carbonylation reactions. J. Phys. Chem. A 128, 10423–10430 (2024).
Google Scholar
Li, X., Zhang, S.-Q., Xu, L.-C. & Hong, X. Predicting regioselectivity in radical C–H functionalization of heterocycles through machine learning. Angew. Chem. Int. Ed. 59, 13253–13259 (2020).
Google Scholar
Zahrt, A. F. et al. Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning. Science 363, eaau5631 (2019).
Google Scholar
Perera, D. et al. A platform for automated nanomole-scale reaction screening and micromole-scale synthesis in flow. Science 359, 429–434 (2018).
Google Scholar
Guo, T. et al. What can large language models do in chemistry? A comprehensive benchmark on eight tasks. Adv. Neural Inf. Process. Syst. 36, 59662–59688 (2023).
Taylor, R. D., MacCoss, M. & Lawson, A. D. Rings in drugs: miniperspective. J. Med. Chem. 57, 5845–5859 (2014).
Google Scholar
Ma, X. et al. A general approach to stereospecific cross-coupling reactions of nitrogen-containing stereocenters. Chem 6, 781–791 (2020).
Google Scholar
Shu, X., Zhong, D., Lin, Y., Qin, X. & Huo, H. Modular access to chiral α-(hetero) aryl amines via Ni/photoredox-catalyzed enantioselective cross-coupling. J. Am. Chem. Soc. 144, 8797–8806 (2022).
Google Scholar
Sarkar, S., Wagulde, S., Jia, X. & Gevorgyan, V. General and selective metal-free radical α-C–H borylation of aliphatic amines. Chem 8, 3096–3108 (2022).
Google Scholar
Zhang, Y. et al. Large language models to accelerate organic chemistry synthesis. Zenodo https://doi.org/10.5281/zenodo.15295848 (2025).
Ruiz-Castillo, P. & Buchwald, S. L. Applications of palladium-catalyzed C–N cross-coupling reactions. Chem. Rev. 116, 12564–12649 (2016).
Google Scholar
Don’t miss more hot News like this! Click here to discover the latest in AI news!
2025-07-01 00:00:00