AI

A comprehensive large-scale biomedical knowledge graph for AI-powered data-driven biomedical research

  • Kitano, H. Nobel Turing Challenge: creating the engine for scientific discovery. npj Syst. Biol. Appl. 7, 29 (2021).

    Article 
    MATH 

    Google Scholar 

  • Li, L. et al. Real-world data medical knowledge graph: construction and applications. Artif. Intell. Med. 103, 101817 (2020).

    Article 
    MATH 

    Google Scholar 

  • Yu, S. et al. BIOS: An algorithmically generated biomedical knowledge graph. Preprint at (2022)

  • Nicholson, D. N. & Greene, C. S. Constructing knowledge graphs and their biomedical applications. Comput. Struct. Biotechnol. J. 18, 1414–1428 (2020).

    Article 
    MATH 

    Google Scholar 

  • Gao, Z., Ding, P. & Xu, R. KG-Predict: a knowledge graph computational framework for drug repurposing. J. Biomed. Inform. 132, 104133 (2022).

    Article 

    Google Scholar 

  • Li, N. et al. KGHC: a knowledge graph for hepatocellular carcinoma. BMC Med. Inf. Decis. Making 20, 135 (2020).

    Article 
    MATH 

    Google Scholar 

  • Ernst, P., Siu, A. & Weikum, G. KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences. BMC Bioinf. 16, 157 (2015).

    Article 
    MATH 

    Google Scholar 

  • Zheng, S. et al. PharmKG: a dedicated knowledge graph benchmark for biomedical data mining. Briefings Bioinform. 22, bbaa344 (2021).

    Article 

    Google Scholar 

  • Petasis, G. et al. Using machine learning to maintain rule-based named-entity recognition and classification systems. In Proc. 39th Annual Meeting on Association for Computational Linguistics: ACL ’01 426–433 (Association for Computational Linguistics, 2001).

  • Kim, J.-H. & Woodland, P.C. A rule-based named entity recognition system for speech input. In Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000) (eds Yuan, B. et al.) 528–531 (International Speech Communication Association, 2000); class=”c-article-references__item js-c-reading-companion-references-item” data-counter=”11.”>

    Miyao, Y., Sagae, K., Sætre, R., Matsuzaki, T. & Tsujii, J. Evaluating contributions of natural language parsers to protein–protein interaction extraction. Bioinformatics 25, 394–400 (2009).

    Article 

    Google Scholar 

  • Lee, J., Kim, S., Lee, S., Lee, K. & Kang, J. On the efficacy of per-relation basis performance evaluation for PPI extraction and a high-precision rule-based approach. BMC Med. Inf. Decis. Making 13, S7 (2013).

    Article 
    MATH 

    Google Scholar 

  • Raja, K., Subramani, S. & Natarajan, J. PPInterFinder—a mining tool for extracting causal relations on human proteins from literature. Database 2013, bas052 (2013).

    Article 
    MATH 

    Google Scholar 

  • Kim, J.-H., Kang, I.-H. & Choi, K.-S. Unsupervised named entity classification models and their ensembles. In Proc. 19th International Conference on Computational Linguistics (COLING 2002) (eds Tseng, S.-C. et al.) 1–7 (Association for Computational Linguistics, 2002); class=”c-article-references__item js-c-reading-companion-references-item” data-counter=”15.”>

    Li, L., Zhou, R. & Huang, D. Two-phase biomedical named entity recognition using CRFs. Comput. Biol. Chem. 33, 334–338 (2009).

    Article 
    MATH 

    Google Scholar 

  • Tikk, D., Thomas, P., Palaga, P., Hakenberg, J. & Leser, U. A comprehensive benchmark of kernel methods to extract protein–protein interactions from literature. PLoS Comput. Biol. 6, e1000837 (2010).

    Article 
    MathSciNet 

    Google Scholar 

  • Bui, Q.-C., Katrenko, S. & Sloot, P. M. A. A hybrid approach to extract protein–protein interactions. Bioinformatics 27, 259–265 (2011).

    Article 
    MATH 

    Google Scholar 

  • Patra, R. & Saha, S. K. A kernel-based approach for biomedical named entity recognition. Sci. World J. 2013, 950796 (2013).

    Article 
    MATH 

    Google Scholar 

  • Hong, L. et al. A novel machine learning framework for automated biomedical relation extraction from large-scale literature repositories. Nat. Mach. Intell. 2, 347–355 (2020).

    Article 
    MATH 

    Google Scholar 

  • Zhang, H.-T., Huang, M.-L. & Zhu, X.-Y. A unified active learning framework for biomedical relation extraction. J. Comput. Sci. Technol. 27, 1302–1313 (2012).

    Article 
    MATH 

    Google Scholar 

  • Yu, K. et al. Automatic extraction of protein-protein interactions using grammatical relationship graph. BMC Med. Inf. Decis. Making 18, 42 (2018).

    Article 
    MATH 

    Google Scholar 

  • Chowdhary, R., Zhang, J. & Liu, J. S. Bayesian inference of protein–protein interactions from biological literature. Bioinformatics 25, 1536–1542 (2009).

    Article 
    MATH 

    Google Scholar 

  • Corbett, P. & Copestake, A. Cascaded classifiers for confidence-based chemical named entity recognition. BMC Bioinf. 9, S4 (2008).

    Article 
    MATH 

    Google Scholar 

  • Lung, P.-Y., He, Z., Zhao, T., Yu, D. & Zhang, J. Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering. Database 2019, bay138 (2019).

    Article 

    Google Scholar 

  • Bell, L., Chowdhary, R., Liu, J. S., Niu, X. & Zhang, J. Integrated bio-entity network: a system for biological knowledge discovery. PLoS ONE 6, e21474 (2011).

    Article 

    Google Scholar 

  • Kim, S., Yoon, J. & Yang, J. Kernel approaches for genic interaction extraction. Bioinformatics 24, 118–126 (2008).

    Article 
    MATH 

    Google Scholar 

  • Bell, L., Zhang, J., & Niu, X. Mixture of logistic models and an ensemble approach for protein-protein interaction extraction. In Proc. 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine (eds Grossman, R. et al.) 371–375 (Association for Computing Machinery, 2011); class=”c-article-references__item js-c-reading-companion-references-item” data-counter=”28.”>

    Florian, R., Ittycheriah, A., Jing, H. & Zhang, T. Named entity recognition through classifier combination. In Proc. 7th Conf. Natural Language Learning at HLT-NAACL 2003 (CoNLL ’03) (eds Daelemans, W. et al.) 168–171 (Association for Computational Linguistics, 2003).

  • Leaman, R., Wei, C.-H. & Lu, Z. tmChem: a high performance approach for chemical named entity recognition and normalization. J. Cheminform. 7, S3 (2015).

    Article 
    MATH 

    Google Scholar 

  • Qu, J. et al. Triage of documents containing protein interactions affected by mutations using an NLP based machine learning approach. BMC Genomics 21, 773 (2020).

    Article 
    MATH 

    Google Scholar 

  • Nguyen, T. H. & Grishman, R. Relation extraction: perspective from convolutional neural networks. In Proc. 1st Workshop on Vector Space Modeling for Natural Language Processing (eds Blunsom, P. et al.) 39–48 (Association for Computational Linguistics, 2015).

  • He, D., Zhang, H., Hao, W., Zhang, R. & Cheng, K. A customized attention-based long short-term memory network for distant supervised relation extraction. Neural Comput. 29, 1964–1985 (2017).

    Article 
    MathSciNet 
    MATH 

    Google Scholar 

  • Li, F., Zhang, M., Fu, G. & Ji, D. A neural joint model for entity and relation extraction from biomedical text. BMC Bioinf. 18, 198 (2017).

    Article 
    MATH 

    Google Scholar 

  • Crichton, G., Pyysalo, S., Chiu, B. & Korhonen, A. A neural network multi-task learning approach to biomedical named entity recognition. BMC Bioinf. 18, 368 (2017).

    Article 
    MATH 

    Google Scholar 

  • Luo, L. et al. An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition. Bioinformatics 34, 1381–1388 (2018).

    Article 
    MATH 

    Google Scholar 

  • Guo, Z., Zhang, Y. & Lu, W. Attention guided graph convolutional networks for relation extraction. In Proc. 57th Annual Meeting of the Association for Computational Linguistics (eds Korhonen, A. et al.) 241–251 (Association for Computational Linguistics, 2019).

  • Gridach, M. Character-level neural network for biomedical named entity recognition. J. Biomed. Inform. 70, 85–91 (2017).

    Article 

    Google Scholar 

  • Lim, S. & Kang, J. Chemical–gene relation extraction using recursive neural network. Database 2018, bay060 (2018).

    Article 

    Google Scholar 

  • Gu, J., Sun, F., Qian, L. & Zhou, G. Chemical-induced disease relation extraction via convolutional neural network. Database 2017, bax024 (2017).

    Article 
    MATH 

    Google Scholar 

  • Habibi, M., Weber, L., Neves, M., Wiegandt, D. L. & Leser, U. Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics 33, i37–i48 (2017).

    Article 

    Google Scholar 

  • Liu, S. et al. Extracting chemical–protein relations using attention-based neural networks. Database 2018, bay102 (2018).

    Article 

    Google Scholar 

  • Wu, H. & Huang, J. Joint entity and relation extraction network with enhanced explicit and implicit semantic information. Appl. Sci. 12, 6231 (2022).

    Article 
    MATH 

    Google Scholar 

  • Akbik, A., Bergmann, T. & Vollgraf, R. Pooled contextualized embeddings for named entity recognition. In Proc. 2019 Conference of the North (eds Burstein, J. et al.) 724–728 (Association for Computational Linguistics, 2019).

  • Eberts, M. & Ulges, A. Span-based Joint Entity and Relation Extraction with Transformer Pre-Training (IOS, 2019).

  • Zhuang, L., Lin, W., Ya, S. & Zhao, J. A robustly optimized BERT pre-training approach with post-training. In Proc. 20th Chinese Natl. Conf. Computational Linguistics (eds Li, S. et al.) 1218–1227 (Chinese Information Processing Society of China, 2021); class=”c-article-references__item js-c-reading-companion-references-item” data-counter=”46.”>

    Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proc. NAACL-HLT 2019 4171-4186 (Association for Computational Linguistics, 2019).

  • Nguyen, D. Q., Vu, T. & Nguyen, A. T. BERTweet: a pre-trained language model for English Tweets. In Proc. 2020 Conf. Empirical Methods in Natural Language Processing: System Demonstrations (eds Liu, Q. & Schlangen, D.) 9–14 (Association for Computational Linguistics, 2020); class=”c-article-references__item js-c-reading-companion-references-item” data-counter=”48.”>

    Lee, J. et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 1234–1240 (2019).

    Article 
    MATH 

    Google Scholar 

  • Liang, C. et al. BOND: BERT-assisted open-domain named entity recognition with distant supervision. In Proc. 26th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining (KDD ’20) (eds Gupta, R. et al.) 1054–1064 (Association for Computing Machinery, 2020); class=”c-article-references__item js-c-reading-companion-references-item” data-counter=”50.”>

    Wadden, D., Wennberg, U., Luan, Y. & Hajishirzi, H. Entity, relation, and event extraction with contextualized span representations. In Proc. 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) 5784–5789 (Association for Computational Linguistics, 2019).

  • Zhang, Z. et al. ERNIE: enhanced language representation with informative entities. In Proc. 57th Annual Meeting of the Association for Computational Linguistics (eds Korhonen, A. et al.) 1441–1451 (Association for Computational Linguistics, 2019).

  • Chang, H., Xu, H., van Genabith, J., Xiong, D. & Zan, H. JoinER-BART: joint entity and relation extraction with constrained decoding, representation reuse and fusion. IEEE/ACM Trans. Audio Speech Lang. Proc. (2023).

  • Yamada, I., Asai, A., Shindo, H., Takeda, H. & Matsumoto, Y. LUKE: deep contextualized entity representations with entity-aware self-attention. In Proc. the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (eds Webber, B. et al.) 6442–6454 (Association for Computational Linguistics, 2020).

  • Beltagy, I., Lo, K. & Cohan, A. SciBERT: a pretrained language model for scientific text. In Proc. 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (eds Inui, K. et al.) 3613–3618 (Association for Computational Linguistics, 2019).

  • Radford, A. et al. Language models are unsupervised multitask learners. OpenAI (2019).

  • Radford, A., Narasimhan, K., Salimans, T. & Sutskever, I. Improving language understanding by generative pre-training. OpenAI (2018).

  • Brown, T. B. et al. Language models are few-shot learners. In Proc. 34th International Conference on Neural Information Processing Systems (eds Larochelle, H. et al.) Vol. 33, 1877–1901 (Curran Associates Inc., 2020).

  • Wei, X. et al. Zero-shot information extraction via chatting with ChatGPT. Preprint at (2023).

  • Pan, J. Z. et al. Large language models and knowledge graphs: opportunities and challenges. Trans. Graph Data Knowl. 1, 2:1–2:38 (2023).

  • Zhu, Y. et al. LLMs for knowledge graph construction and reasoning: recent capabilities and future opportunities. World Wide Web 27, 58 (2023).

    Article 
    MATH 

    Google Scholar 

  • Kandpal, N., Deng, H., Roberts, A., Wallace, E. & Raffel, C. Large language models struggle to learn long-tail knowledge. In Proc. 40th Int. Conf. Machine Learning (ICML 2023) (eds Krause, A. et al.) Vol. 202, 15708–15719 (PMLR, 2023); class=”c-article-references__item js-c-reading-companion-references-item” data-counter=”62.”>

    Li, T., Hosseini, M. J., Weber, S. & Steedman, M. Language models are poor learners of directional inference. In Findings of the Association for Computational Linguistics: EMNLP 2022 (eds Goldberg, Y. et al.) 903–921 (Association for Computational Linguistics, 2022).

  • Elazar, Y. et al. Measuring and improving consistency in pretrained language models. Trans. Assoc. Comput. Ling. 9, 1012–1031 (2021).

    MATH 

    Google Scholar 

  • Heinzerling, B. & Inui, K. Language models as knowledge bases: on entity representations, storage capacity, and paraphrased queries. In Proc. 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (eds Merlo, P. et al.) 1772–1791 (Association for Computational Linguistics, 2021).

  • Zheng, Q., Guo, K. & Xu, L. A large-scale Chinese patent dataset for information extraction. Syst. Sci. Control Eng. 12, 2365328 (2024).

    Article 
    MATH 

    Google Scholar 

  • Stoica, G., Platanios, E. A. & Poczos, B. Re-TACRED: addressing shortcomings of the TACRED dataset. In Proc. AAAI Conf. Artif. Intell. Vol. 35, 13843–13850 (2021); class=”c-article-references__item js-c-reading-companion-references-item” data-counter=”67.”>

    Luan, Y., He, L., Ostendorf, M. & Hajishirzi, H. Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In Proc. 2018 Conference on Empirical Methods in Natural Language Processing (eds Riloff, E. et al.) 3219–3232 (Association for Computational Linguistics, 2018).

  • Wouters, O. J., McKee, M. & Luyten, J. Estimated research and development needed to bring a new medicine to , 2009-2018. JAMA 323, 844 (2020).

    Article 

    Google Scholar 

  • Lovering, F., Bikker, J. & Humblet, C. Escape from flatland: increasing saturation as an approach to improving clinical success. J. Med. Chem. 52, 6752–6756 (2009).

    Article 
    MATH 

    Google Scholar 

  • Cui, L. et al. DETERRENT: knowledge guided graph attention network for detecting healthcare misinformation. In Proc. 26th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining (KDD ’20) (eds Gupta, R. et al.) 492–502 (Association for Computing Machinery, 2020); class=”c-article-references__item js-c-reading-companion-references-item” data-counter=”71.”>

    Mohamed, S. K., Nounu, A. & Nováček, V. Biological applications of knowledge graph embedding models. Briefings Bioinform. 22, 1679–1693 (2021).

    Article 
    MATH 

    Google Scholar 

  • Wang, C., Yu, H. & Wan, F. Information retrieval technology based on knowledge graph. In Proc. 3rd Int. Conf. Advances in Materials, Mechatronics and Civil Engineering (ICAMMCE 2018) 291–296 (Atlantis Press, 2018); class=”c-article-references__item js-c-reading-companion-references-item” data-counter=”73.”>

    Himmelstein, D. S. et al. Systematic integration of biomedical knowledge prioritizes drugs for repurposing. eLife 6, e26726 (2017).

    Article 
    MATH 

    Google Scholar 

  • Azuaje, F. Drug interaction networks: an introduction to translational and clinical applications. Cardiovascular Res. 97, 631–641 (2013).

    Article 
    MATH 

    Google Scholar 

  • Ye, H., Liu, Q. & Wei, J. Construction of drug network based on side effects and its application for drug repositioning. PLoS ONE 9, e87864 (2014).

    Article 
    MATH 

    Google Scholar 

  • Chen, H., Zhang, H., Zhang, Z., Cao, Y. & Tang, W. Network-based inference methods for drug repositioning. Comput. Math. Methods Med. 2015, 130620 (2015).

    Article 
    MATH 

    Google Scholar 

  • Luo, Y. et al. A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat. Commun. 8, 573 (2017).

    Article 
    MATH 

    Google Scholar 

  • Islamaj, R., Lai, P.-T., Wei, C.-H., Luo, L. & Lu, Z. The overview of the BioRED (Biomedical Relation Extraction Dataset) track at BioCreative VIII. Zenodo (2023).

  • Luo, L., Lai, P.-T., Wei, C.-H., Arighi, C. N. & Lu, Z. BioRED: a rich biomedical relation extraction dataset. Briefings Bioinform. 23, bbac282 (2022).

    Article 

    Google Scholar 

  • Ahmed, F. et al. SperoPredictor: an integrated machine learning and molecular docking-based drug repurposing framework with use case of COVID-19. Front. Public Health 10, 902123 (2022).

    Article 

    Google Scholar 

  • Ahmed, F. et al. A comprehensive review of artificial intelligence and network based approaches to drug repurposing in Covid-19. Biomed. Pharmacother. 153, 113350 (2022).

    Article 
    MATH 

    Google Scholar 

  • Zhou, Y. et al. Network-based drug repurposing for novel coronavirus 2019-nCoV/SARS-CoV-2. Cell Disc. 6, 14 (2020).

    Article 
    MATH 

    Google Scholar 

  • Aghdam, R., Habibi, M. & Taheri, G. Using informative features in machine learning based method for COVID-19 drug repurposing. J. Cheminformatics 13, 70 (2021).

    Article 
    MATH 

    Google Scholar 

  • Belikov, A. V., Rzhetsky, A. & Evans, J. Prediction of robust scientific facts from literature. Nat. Mach. Intell. 4, 445–454 (2022).

    Article 
    MATH 

    Google Scholar 

  • Gu, Y. et al. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthcare 3, 1–23 (2022).

    Article 
    MATH 

    Google Scholar 

  • Reimers, N. & Gurevych, I. Sentence-BERT: sentence embeddings using Siamese BERT-networks. In Proc. 2019 Conf. Empirical Methods in Natural Language Processing and the 9th Int. Joint Conf. Natural Language Processing (EMNLP-IJCNLP) 3982–3992 (Association for Computational Linguistics, 2019); class=”c-article-references__item js-c-reading-companion-references-item” data-counter=”87.”>

    Liu, Y. et al. RoBERTa: a robustly optimized BERT pretraining approach. Preprint at (2019).

  • Raffel, C. et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 1–67 (2020).

    MathSciNet 
    MATH 

    Google Scholar 

  • Peng, Y., Yan, S. & Lu, Z. Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. In Proc. 18th BioNLP Workshop and Shared Task (eds Demner-Fushman, D. et al.) 58–65 (Association for Computational Linguistics, 2019).

  • Alsentzer, E. et al. Publicly available clinical BERT embeddings. In Proc. 2nd Clinical Natural Language Processing Workshop (eds Rumshisky, A. et al.) 72–78 (Association for Computational Linguistics, 2019).

  • Sohn, S., Comeau, D. C., Kim, W. & Wilbur, W. J. Abbreviation definition identification based on automatic precision estimates. BMC Bioinf. 9, 402 (2008).

    Article 
    MATH 

    Google Scholar 

  • Chandak, P., Huang, K. & Zitnik, M. Building a knowledge graph to enable precision medicine. Sci. Data 10, 67 (2023).

    Article 
    MATH 

    Google Scholar 

  • Zhou, Y. et al. TTD: Therapeutic Target Database describing target druggability information. Nucleic Acids Res. 52, D1465–D1477 (2023).

    Article 

    Google Scholar 

  • Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).

    Article 
    MATH 

    Google Scholar 

  • Gene Ontology Consortium et al. The Gene Ontology knowledgebase in 2023. Genetics 224, iyad031 (2023).

    Article 

    Google Scholar 

  • Wilks, C. et al. recount3: summaries and queries ffor large-scale RNA-seq expression and splicing. Genome Biol. 22, 323 (2021).

    Article 
    MATH 

    Google Scholar 

  • Zhang, Y. et al. myinsilicom/iKraph: 1.0.0. Zenodo (2024).

  • Zhang, Y. et al. iKraph: a comprehensive, large-scale biomedical knowledge graph for AI-powered, data-driven biomedical research. Zenodo (2025).

  • 2025-03-17 00:00:00

    Related Articles

    Back to top button