AI

Data-driven federated learning in drug discovery with knowledge distillation

  • Liu, X. et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit. Health 1, e271–e297 (2019).

    Article 
    MATH 

    Google Scholar 

  • Zhou, W. et al. Ensembled deep learning model outperforms human experts in diagnosing biliary atresia from sonographic gallbladder images. Nat. Commun. 12, 1259 (2021).

    Article 
    MATH 

    Google Scholar 

  • Topaloglu, M. Y., Morrell, E. M., Rajendran, S. & Topaloglu, U. In the pursuit of privacy: the promises and predicaments of federated learning in healthcare. Front. Artif. Intell. 4, 746497 (2021).

    Article 
    MATH 

    Google Scholar 

  • Brauneck, A. et al. Federated machine learning in data-protection-compliant research. Nat. Mach. Intell. 5, 2–4 (2023).

    Article 
    MATH 

    Google Scholar 

  • Bak, M. et al. Federated learning is not a cure-all for data ethics. Nat. Mach. Intell. 6, 370–372 (2024).

    Article 
    MATH 

    Google Scholar 

  • Zhu, H., Xu, J., Liu, S. & Jin, Y. Federated learning on non-IID data: a survey. Neurocomputing 465, 371–390 (2021).

    Article 
    MATH 

    Google Scholar 

  • McMahan, B., Moore, E., Ramage, D., Hampson, S. & Arcas, B. A. y. Communication-efficient learning of deep networks from decentralized data. In Proc. 20th International Conference on Artificial Intelligence and Statistics PMLR 54, 1273–1282 (2017).

  • Zhou, J. et al. A survey on federated learning and its applications for accelerating industrial internet of things. Preprint at (2021).

  • Li, L., Fan, Y., Tse, M. & Lin, K.-Y. A review of applications in federated learning. Comput. Ind. Eng. 149, 106854 (2020).

    Article 
    MATH 

    Google Scholar 

  • Li, T., Sahu, A. K., Talwalkar, A. & Smith, V. Federated learning: challenges, methods, and future directions. IEEE Signal Process. Mag. 37, 50–60 (2020).

    MATH 

    Google Scholar 

  • Yin, X., Zhu, Y. & Hu, J. A comprehensive survey of privacy-preserving federated learning: a taxonomy, review, and future directions. ACM Comput. Surv. 54, 131:1–131:36 (2021).

    MATH 

    Google Scholar 

  • Kairouz, P. et al. Advances and open problems in federated learning. Foundations and Trends in Machine Learning 14, 1–210, (2021).

  • Liu, J. et al. From distributed machine learning to federated learning: a survey. Knowl. Inf. Syst. 64, 885–917 (2022).

    Article 
    MATH 

    Google Scholar 

  • Konečný, J., McMahan, H. B., Ramage, D. & Richtárik, P. Federated optimization: distributed machine learning for on-device intelligence. Preprint at (2016).

  • Abadi, M. et al. Deep learning with differential privacy. In Proc. 2016 ACM SIGSAC Conference on Computer and Communications Security 308–318 (ACM, 2016).

  • Dwork, C. Differential privacy: a survey of results. In Proc. International Conference on Theory and Applications of Models of Computation (eds Agrawal, M. et al.) 1–19 (Springer, 2008).

  • Long, G., Tan, Y., Jiang, J. & Zhang, C. in Federated Learning: Privacy and Incentive (eds Yang, Q. et al.) 240–254 (Springer, 2020).

  • Rieke, N. et al. The future of digital health with federated learning. Npj Digit. Med. 3, 119 (2020).

    Article 
    MATH 

    Google Scholar 

  • Choudhury, O. et al. Predicting adverse drug reactions on distributed health data using federated learning. AMIA. Annu. Symp. Proc. 2019, 313–322 (2020).

    MATH 

    Google Scholar 

  • Nguyen, D. C. et al. Federated learning for smart healthcare: a survey. ACM Computing Surveys (Csur) 55, 1–37 (2022).

  • Xiong, Z. et al. Facing small and biased data dilemma in drug discovery with enhanced federated learning approaches. Sci. China Life Sci. 65, 529–539 (2022).

    Article 
    MATH 

    Google Scholar 

  • Manu, D. et al. FL-DISCO: federated generative adversarial network for graph-based molecule drug discovery: special session paper. In Proc. 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD) 1–7 (IEEE, 2021).

  • Naz, S., Phan, K. T. & Chen, Y.-P. P. A comprehensive review of federated learning for COVID-19 detection. Int. J. Intell. Syst. 37, 2371–2392 (2022).

    Article 
    MATH 

    Google Scholar 

  • Goldsmith, M. R. et al. in Crop Protection Products for Sustainable Agriculture (eds Rauzan, B. M. & Lorsbach, B. A.) Vol. 1390, 181–200 (American Chemical Society, 2021).

  • Heyndrickx, W. et al. MELLODDY: cross-pharma federated learning at unprecedented scale unlocks benefits in QSAR without compromising proprietary information. J. Chem. Inf. Model. 64, 2331–2344 (2024).

    Article 
    MATH 

    Google Scholar 

  • Hanser, T. Federated learning for molecular discovery. Curr. Opin. Struct. Biol. 79, 102545 (2023).

    Article 
    MATH 

    Google Scholar 

  • Konečný, J. et al. Federated learning: strategies for improving communication efficiency. Preprint at (2017).

  • Wu, C., Wu, F., Lyu, L., Huang, Y. & Xie, X. Communication-efficient federated learning via knowledge distillation. Nat. Commun. 13, 2032 (2022).

    Article 
    MATH 

    Google Scholar 

  • Zhu, X. Semi-Supervised Learning Literature Survey (Univ. Wisconsin, 2005); class=”c-article-references__item js-c-reading-companion-references-item” data-counter=”30.”>

    Hinton, G., Vinyals, O. & Dean, J. Distilling the knowledge in a neural network. Preprint at (2015).

  • Papernot, N., Abadi, M., Erlingsson, Ú., Goodfellow, I. & Talwar, K. Semi-supervised knowledge transfer for deep learning from private training data. Preprint at (2016).

  • Papernot, N. et al. Scalable private learning with PATE. Preprint at (2018).

  • Breiman, L. Bagging predictors. Mach. Learn. 24, 123–140 (1996).

    Article 
    MATH 

    Google Scholar 

  • Dietterich, T. G. Ensemble methods in machine learning. In Proc. International Workshop on Multiple Classifier Systems (eds Kittler, J. & Roli, F.) 1–15 (Springer, 2000).

  • Li, L., Gou, J., Yu, B., Du, L. & Tao, Z. Y. D. Federated distillation: a survey. Preprint at (2024).

  • Eldar, Y. C. et al. in Machine Learning and Wireless Communications (eds Goldsmith, A. et al.) 457–485 (Cambridge Univ. Press, 2022).

  • Li, D. & Wang, J. FedMD: heterogenous federated learning via model distillation. Preprint at (2019).

  • Itahara, S., Nishio, T., Koda, Y., Morikura, M. & Yamamoto, K. Distillation-based semi-supervised federated learning for communication-efficient collaborative training with non-IID private data. IEEE Trans. Mob. Comput. 22, 191–205 (2023).

    Article 

    Google Scholar 

  • Sattler, F., Marban, A., Rischke, R. & Samek, W. Communication-efficient federated distillation. Preprint at (2020).

  • Sui, D. et al. FedED: federated learning via ensemble distillation for medical relation extraction. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (eds Webber, B. et al.) 2118–2128 (Association for Computational Linguistics, 2020).

  • Han, S. et al. FedX: unsupervised federated learning with cross knowledge distillation. In European Conference on Computer Vision. (eds Avidan, S. et al.) 691–707 (Springer Nature Switzerland, 2022).

  • Jeong, E. et al. Communication-efficient on-device machine learning: federated distillation and augmentation under non-IID private data. Preprint at (2023).

  • Choquette-Choo, C. A. et al. CaPC learning: confidential and private collaborative learning. Preprint at (2021).

  • PyGrid: a peer-to-peer platform for private data science and federated learning OpenMined Blog / (2020).

  • FLuID POC platform. GitHub (2023).

  • Hancox, J. C., McPate, M. J., El Harchi, A. & Zhang, Y. H. The hERG potassium channel and hERG screening for drug-induced torsades de pointes. Pharmacol. Ther. 119, 118–132 (2008).

    Article 

    Google Scholar 

  • Wolford, B. What is GDPR, the EU’s new data protection ? GDPR.eu / (2018).

  • Shokri, R., Stronati, M., Song, C. & Shmatikov, V. Membership inference attacks against machine learning model. In IEEE Symposium on Security and Privacy (SP) 3–18 (IEEE, 2017).

  • Raipuria, G., Bonthu, S. & Singhal, N. Noise robust training of segmentation model using knowledge distillation. In Proc. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021 (eds Del Bimbo, A. et al.) 97–104 (Springer, 2021).

  • Matthews, B. W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta 405, 442–451 (1975).

    Article 
    MATH 

    Google Scholar 

  • Bassani, D., Brigo, A. & Andrews-Morger, A. Federated learning in computational toxicology: an industrial perspective on the Effiris Hackathon. Chem. Res. Toxicol. 36, 1503–1517 (2023).

    Article 

    Google Scholar 

  • Kim, S. et al. PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res. 49, D1388–D1395 (2021).

    Article 

    Google Scholar 

  • Bajusz, D., Rácz, A. & Héberger, K. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. Cheminformatics 7, 20 (2015).

    Article 
    MATH 

    Google Scholar 

  • Glen, R. et al. Circular fingerprints: flexible molecular descriptors with applications from physical chemistry to ADME. IDrugs Investig. Drugs J. 9, 199–204 (2006).

    MATH 

    Google Scholar 

  • Hudson, B. D., Hyde, R. M., Rahr, E., Wood, J. & Osman, J. Parameter based methods for compound selection from chemical databases. Quant. Struct. Act. Relatsh. 15, 285–289 (1996).

    Article 

    Google Scholar 

  • Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).

    Article 
    MATH 

    Google Scholar 

  • Maggiora, G., Vogt, M., Stumpfe, D. & Bajorath, J. Molecular similarity in medicinal chemistry. J. Med. Chem. 57, 3186–3204 (2014).

    Article 
    MATH 

    Google Scholar 

  • Maggiora, G. M. Concepts and Applications of Molecular Similarity (eds Johnson, M. A. & Maggiora, G. M.) (John Wiley & Sons, 1990).

  • Pan, S. J. & Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 1345–1359 (2010).

    Article 
    MATH 

    Google Scholar 

  • Gaulton, A. et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40, D1100–D1107 (2012).

    Article 

    Google Scholar 

  • MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proc. Fifth Berkeley Symp. Math. Stat. Probab. (eds Marie Le Cam, L. & Neyman, J.) Vol. 1, 281–298 (1967).

  • Siramshetty, V. B., Chen, Q., Devarakonda, P. & Preissner, R. The catch-22 of predicting hERG blockade using publicly accessible bioactivity data. J. Chem. Inf. Model. 58, 1224–1233 (2018).

    Article 

    Google Scholar 

  • Ho, T. K. Random decision forests. In Proc. 3rd International Conference on Document Analysis and Recognition Vol. 1, 278–282 (1995).

  • Hanser, T., Barber, C., Marchaland, J. F. & Werner, S. Applicability domain: towards a more formal definition. SAR QSAR Environ. Res. (2016) .

  • Hanser, T. et al. Self organising hypothesis networks: a new approach for representing and structuring SAR knowledge. J. Cheminformatics 6, 21 (2014).

    Article 
    MATH 

    Google Scholar 

  • Carhart, R., Smith, D. H. & Venkataraghavan, R. Atom pairs as molecular features in structure-activity studies: definition and applications. J. Chem. Inf. Comput. Sci. (1985).

  • Hanser, T., Steinmetz, F. P., Plante, J., Rippmann, F. & Krier, M. Avoiding hERG-liability in drug design via synergetic combinations of different (Q)SAR methodologies and data sources: a case study in an industrial setting. J. Cheminformatics 11, 9 (2019).

    Article 

    Google Scholar 

  • Hanser, T., Werner, S. & Plante, J. FLuID POC a simulation platform for federated distillation. Zenodo (2024).

  • van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

    MATH 

    Google Scholar 

  • 2025-03-05 00:00:00

    Related Articles

    Back to top button