Bridging chemistry and artificial intelligence by a reaction description language

Thirunavukarasu, A. J. et al. Large language models in medicine. Nat. Med. 29, 1930–1940 (2023).
Google Scholar
Boiko, D. A., MacKnight, R., Kline, B. & Gomes, G. Autonomous chemical research with large language models. Nature 624, 570–578 (2023).
Google Scholar
Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl Acad. Sci. USA 118, e2016239118 (2021).
Google Scholar
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Google Scholar
Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).
Google Scholar
Grisoni, F. Chemical language models for de novo drug design: challenges and opportunities. Curr. Opin. Struct. Biol. 79, 102527 (2023).
Google Scholar
Skinnider, M. A., Stacey, R. G., Wishart, D. S. & Foster, L. J. Chemical language models enable navigation in sparsely populated chemical space. Nat. Mach. Intell. 3, 759–770 (2021).
Google Scholar
Moret, M. et al. Leveraging molecular structure and bioactivity with chemical language models for de novo drug design. Nat. Commun. 14, 114 (2023).
Google Scholar
Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1988).
Google Scholar
Krenn, M., Häse, F., Nigam, A., Friederich, P. & Aspuru-Guzik, A. Self-referencing embedded strings (SELFIES): a 100% robust molecular string representation. Mach. Learn. Sci. Technol. 1, 045024 (2020).
Google Scholar
Kuenneth, C. & Ramprasad, R. polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics. Nat. Commun. 14, 4099 (2023).
Google Scholar
Krenn, M. et al. SELFIES and the future of molecular string representations. Patterns 3, 100588 (2022).
Google Scholar
Schwaller, P. et al. Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Cent. Sci. 5, 1572–1583 (2019).
Google Scholar
Liu, B. et al. Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS Cent. Sci. 3, 1103–1113 (2017).
Google Scholar
Sun, Y. & Sahinidis, N. V. Computer-aided retrosynthetic design: fundamentals, tools, and outlook. Curr. Opin. Chem. Eng. 35, 100721 (2022).
Google Scholar
Wang, X. et al. RetroPrime: a diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chem. Eng. J. 420, 129845 (2021).
Google Scholar
Thakkar, A. et al. Unbiasing retrosynthesis language models with disconnection prompts. ACS Cent. Sci. 9, 1488–1498 (2023).
Google Scholar
Huang, T. & Li, Y. Current progress, challenges, and future perspectives of language models for protein representation and protein design. The innovation 4, 100446 (2023).
Google Scholar
Min, B. et al. Recent advances in natural language processing via large pre-trained language models: a survey. ACM Comput. Surv. 56, 1–40 (2023).
Google Scholar
Schwaller, P., Hoover, B., Reymond, J.-L., Strobelt, H. & Laino, T. Extraction of organic chemistry grammar from unsupervised learning of chemical reactions. Sci. Adv. 7, eabe4166 (2021).
Google Scholar
Schwaller, P. et al. Mapping the space of chemical reactions using attention-based neural networks. Nat. Mach. Intell. 3, 144–152 (2021).
Google Scholar
Strieth-Kalthoff, F. et al. Artificial intelligence for retrosynthetic planning needs both data and expert knowledge. J. Am. Chem. Soc. 146, 11005–11017 (2024).
Nugmanov, R. I. et al. CGRtools: Python library for molecule, reaction, and condensed graph of reaction processing. J. Chem. Inf. Model. 59, 2516–2521 (2019).
Google Scholar
Shi, C., Xu, M., Guo, H., Zhang, M. & Tang, J. A graph to graphs framework for retrosynthesis prediction. In Proc. 37th International Conference on Machine Learning (eds Blei, D. et al.) 8818–8827 (PMLR, 2020).
Yan, C. et al. Retroxpert: decompose retrosynthesis prediction like a chemist. Adv. Neural Inf. Process. Syst. 33, 11248–11258 (2020).
Somnath, V. R., Bunne, C., Coley, C., Krause, A. & Barzilay, R. Learning graph models for retrosynthesis prediction. Adv. Neural Inf. Process. Syst. 34, 9405–9415 (2021).
Zhong, W., Yang, Z. & Chen, C. Y.-C. Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nat. Commun. 14, 3009 (2023).
Google Scholar
Wang, Y. et al. Retrosynthesis prediction with an interpretable deep-learning framework based on molecular assembly tasks. Nat. Commun. 14, 6155 (2023).
Google Scholar
Saebi, M. et al. On the use of real-world datasets for reaction yield prediction. Chem. Sci. 14, 4997–5005 (2023).
Google Scholar
Lu, J. & Zhang, Y. Unified deep learning model for multitask reaction predictions with explanation. J. Chem. Inf. Model. 62, 1376–1387 (2022).
Google Scholar
Wan, Y., Hsieh, C.-Y., Liao, B. & Zhang, S. Retroformer: pushing the limits of end-to-end retrosynthesis transformer. In Proc. 39th International Conference on Machine Learning (eds Chaudhuri, K. et al.) 22475–22490 (PMLR, 2022).
Dong, J. et al. Ketones and aldehydes as alkyl radical equivalents for C-H functionalization of heteroarenes. Sci. Adv. 5, eaax9955 (2019).
Google Scholar
Peltzer, R. M., Gauss, J., Eisenstein, O. & Cascella, M. The Grignard reaction–unraveling a chemical puzzle. J. Am. Chem. Soc. 142, 2984–2994 (2020).
Google Scholar
Heravi, M. M., Hashemi, E. & Nazari, N. Negishi coupling: an easy progress for C–C bond construction in total synthesis. Mol. Divers. 18, 441–472 (2014).
Google Scholar
Kotha, S., Lahiri, K. & Kashinath, D. Recent applications of the Suzuki–Miyaura cross-coupling reaction in organic synthesis. Tetrahedron 58, 9633–9695 (2002).
Google Scholar
Zhou, J., Zhao, Z. & Shibata, N. Transition-metal-free silylboronate-mediated cross-couplings of organic fluorides with amines. Nat. Commun. 14, 1847 (2023).
Google Scholar
Vulovic, B., Cinderella, A. P. & Watson, D. A. Palladium-catalyzed cross-coupling of monochlorosilanes and Grignard reagents. ACS Catal. 7, 8113–8117 (2017).
Google Scholar
Xu, W. Q., Xu, X. H. & Qing, F. L. Synthesis and properties of CF3 (OCF3) CH‐substituted arenes and alkenes. Chin. J. Chem. 38, 847–854 (2020).
Google Scholar
Van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Probst, D., Schwaller, P. & Reymond, J.-L. Reaction classification and yield prediction using the differential reaction fingerprint DRFP. Digit. Discov. 1, 91–97 (2022).
Google Scholar
Schneider, N., Lowe, D. M., Sayle, R. A. & Landrum, G. A. Development of a novel fingerprint for chemical reactions and its application to large-scale reaction classification and similarity. J. Chem. Inf. Model. 55, 39–53 (2015).
Google Scholar
Kajino, M., Hasuoka, A. & Nishida, H. 1-heterocyclylsulfonyl, 2-aminomethyl, 5- (hetero-) aryl substituted 1-H-pyrrole derivatives as acid secretion inhibitors. Patent WO2007026916A1 (2007).
Yu, Q.-Y., Zeng, H., Yao, K., Li, J.-Q. & Liu, Y. Novel and practical synthesis of vonoprazan fumarate. Synth. Commun. 47, 1169–1174 (2017).
Google Scholar
Chen, S. & Jung, Y. Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au. 1, 1612–1620 (2021).
Google Scholar
Tetko, I. V., Karpov, P., Van Deursen, R. & Godin, G. State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis. Nat. Commun. 11, 5575 (2020).
Google Scholar
Zhong, Z. et al. Root-aligned SMILES: a tight representation for chemical reaction prediction. Chem. Sci. 13, 9023–9034 (2022).
Google Scholar
Irwin, R., Dimitriadis, S., He, J. & Bjerrum, E. J. Chemformer: a pre-trained transformer for computational chemistry. Mach. Learn. Sci. Technol. 3, 015022 (2022).
Google Scholar
Zdrazil, B. et al. The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Res. 52, D1180–D1192 (2024).
Google Scholar
Tingle, B. I. et al. ZINC-22—a free multi-billion-scale database of tangible compounds for ligand discovery. J. Chem. Inf. Model. 63, 1166–1176 (2023).
Google Scholar
Chilingaryan, G. et al. BartSmiles: generative masked language models for molecular representations. J. Chem. Inf. Model. 64, 5832–5843 (2024).
Google Scholar
zw-SIMM & Xiong, J. jiachengxiong/ReactSeq: ReactSeq (1.0). Zenodo https://doi.org/10.5281/zenodo.13338263 (2024).
Coley, C. W., Rogers, L., Green, W. H. & Jensen, K. F. Computer-assisted retrosynthesis based on molecular similarity. ACS Cent. Sci. 3, 1237–1245 (2017).
Google Scholar
Segler, M. H. & Waller, M. P. Neural‐symbolic machine learning for retrosynthesis and reaction prediction. Chem-Eur. J. 23, 5966–5971 (2017).
Google Scholar
Dai, H., Li, C., Coley, C., Dai, B. & Song, L. Retrosynthesis prediction with conditional graph logic network. Adv. Neural Inf. Process. Syst. 32, 8872–8882 (2019).
Sacha, M. et al. Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. J. Chem. Inf. Model. 61, 3273–3284 (2021).
Google Scholar
Chen, Z., Ayinde, O. R., Fuchs, J. R., Sun, H. & Ning, X. G2Retro as a two-step graph generative models for retrosynthesis prediction. Commun. Chem. 6, 102 (2023).
Google Scholar
Yao, L. et al. Node-aligned graph-to-graph: elevating template-free deep learning approaches in single-step retrosynthesis. JACS Au. 4, 992–1003 (2024).
Google Scholar
Liu, X. et al. RetroCaptioner: beyond attention in end-to-end retrosynthesis transformer via contrastively captioned learnable graph representation. Bioinformatics 40, btae561 (2024).
Google Scholar
Zheng, S., Rao, J., Zhang, Z., Xu, J. & Yang, Y. Predicting retrosynthetic reactions using self-corrected transformer neural networks. J. Chem. Inf. Model. 60, 47–55 (2019).
Google Scholar
Sun, R., Dai, H., Li, L., Kearnes, S. & Dai, B. Towards understanding retrosynthesis by energy-based models. Adv. Neural Inf. Process. Syst. 34, 10186–10194 (2021).
Don’t miss more hot News like this! AI/" target="_blank" rel="noopener">Click here to discover the latest in AI news!
2025-05-13 00:00:00