Data-driven federated learning in drug discovery with knowledge distillation
Liu, X. et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit. Health 1, e271–e297 (2019).
Google Scholar
Zhou, W. et al. Ensembled deep learning model outperforms human experts in diagnosing biliary atresia from sonographic gallbladder images. Nat. Commun. 12, 1259 (2021).
Google Scholar
Topaloglu, M. Y., Morrell, E. M., Rajendran, S. & Topaloglu, U. In the pursuit of privacy: the promises and predicaments of federated learning in healthcare. Front. Artif. Intell. 4, 746497 (2021).
Google Scholar
Brauneck, A. et al. Federated machine learning in data-protection-compliant research. Nat. Mach. Intell. 5, 2–4 (2023).
Google Scholar
Bak, M. et al. Federated learning is not a cure-all for data ethics. Nat. Mach. Intell. 6, 370–372 (2024).
Google Scholar
Zhu, H., Xu, J., Liu, S. & Jin, Y. Federated learning on non-IID data: a survey. Neurocomputing 465, 371–390 (2021).
Google Scholar
McMahan, B., Moore, E., Ramage, D., Hampson, S. & Arcas, B. A. y. Communication-efficient learning of deep networks from decentralized data. In Proc. 20th International Conference on Artificial Intelligence and Statistics PMLR 54, 1273–1282 (2017).
Zhou, J. et al. A survey on federated learning and its applications for accelerating industrial internet of things. Preprint at (2021).
Li, L., Fan, Y., Tse, M. & Lin, K.-Y. A review of applications in federated learning. Comput. Ind. Eng. 149, 106854 (2020).
Google Scholar
Li, T., Sahu, A. K., Talwalkar, A. & Smith, V. Federated learning: challenges, methods, and future directions. IEEE Signal Process. Mag. 37, 50–60 (2020).
Google Scholar
Yin, X., Zhu, Y. & Hu, J. A comprehensive survey of privacy-preserving federated learning: a taxonomy, review, and future directions. ACM Comput. Surv. 54, 131:1–131:36 (2021).
Google Scholar
Kairouz, P. et al. Advances and open problems in federated learning. Foundations and Trends in Machine Learning 14, 1–210, (2021).
Liu, J. et al. From distributed machine learning to federated learning: a survey. Knowl. Inf. Syst. 64, 885–917 (2022).
Google Scholar
Konečný, J., McMahan, H. B., Ramage, D. & Richtárik, P. Federated optimization: distributed machine learning for on-device intelligence. Preprint at (2016).
Abadi, M. et al. Deep learning with differential privacy. In Proc. 2016 ACM SIGSAC Conference on Computer and Communications Security 308–318 (ACM, 2016).
Dwork, C. Differential privacy: a survey of results. In Proc. International Conference on Theory and Applications of Models of Computation (eds Agrawal, M. et al.) 1–19 (Springer, 2008).
Long, G., Tan, Y., Jiang, J. & Zhang, C. in Federated Learning: Privacy and Incentive (eds Yang, Q. et al.) 240–254 (Springer, 2020).
Rieke, N. et al. The future of digital health with federated learning. Npj Digit. Med. 3, 119 (2020).
Google Scholar
Choudhury, O. et al. Predicting adverse drug reactions on distributed health data using federated learning. AMIA. Annu. Symp. Proc. 2019, 313–322 (2020).
Google Scholar
Nguyen, D. C. et al. Federated learning for smart healthcare: a survey. ACM Computing Surveys (Csur) 55, 1–37 (2022).
Xiong, Z. et al. Facing small and biased data dilemma in drug discovery with enhanced federated learning approaches. Sci. China Life Sci. 65, 529–539 (2022).
Google Scholar
Manu, D. et al. FL-DISCO: federated generative adversarial network for graph-based molecule drug discovery: special session paper. In Proc. 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD) 1–7 (IEEE, 2021).
Naz, S., Phan, K. T. & Chen, Y.-P. P. A comprehensive review of federated learning for COVID-19 detection. Int. J. Intell. Syst. 37, 2371–2392 (2022).
Google Scholar
Goldsmith, M. R. et al. in Crop Protection Products for Sustainable Agriculture (eds Rauzan, B. M. & Lorsbach, B. A.) Vol. 1390, 181–200 (American Chemical Society, 2021).
Heyndrickx, W. et al. MELLODDY: cross-pharma federated learning at unprecedented scale unlocks benefits in QSAR without compromising proprietary information. J. Chem. Inf. Model. 64, 2331–2344 (2024).
Google Scholar
Hanser, T. Federated learning for molecular discovery. Curr. Opin. Struct. Biol. 79, 102545 (2023).
Google Scholar
Konečný, J. et al. Federated learning: strategies for improving communication efficiency. Preprint at (2017).
Wu, C., Wu, F., Lyu, L., Huang, Y. & Xie, X. Communication-efficient federated learning via knowledge distillation. Nat. Commun. 13, 2032 (2022).
Google Scholar
Zhu, X. Semi-Supervised Learning Literature Survey (Univ. Wisconsin, 2005); class=”c-article-references__item js-c-reading-companion-references-item” data-counter=”30.”>
Hinton, G., Vinyals, O. & Dean, J. Distilling the knowledge in a neural network. Preprint at (2015).
Papernot, N., Abadi, M., Erlingsson, Ú., Goodfellow, I. & Talwar, K. Semi-supervised knowledge transfer for deep learning from private training data. Preprint at (2016).
Papernot, N. et al. Scalable private learning with PATE. Preprint at (2018).
Breiman, L. Bagging predictors. Mach. Learn. 24, 123–140 (1996).
Google Scholar
Dietterich, T. G. Ensemble methods in machine learning. In Proc. International Workshop on Multiple Classifier Systems (eds Kittler, J. & Roli, F.) 1–15 (Springer, 2000).
Li, L., Gou, J., Yu, B., Du, L. & Tao, Z. Y. D. Federated distillation: a survey. Preprint at (2024).
Eldar, Y. C. et al. in Machine Learning and Wireless Communications (eds Goldsmith, A. et al.) 457–485 (Cambridge Univ. Press, 2022).
Li, D. & Wang, J. FedMD: heterogenous federated learning via model distillation. Preprint at (2019).
Itahara, S., Nishio, T., Koda, Y., Morikura, M. & Yamamoto, K. Distillation-based semi-supervised federated learning for communication-efficient collaborative training with non-IID private data. IEEE Trans. Mob. Comput. 22, 191–205 (2023).
Google Scholar
Sattler, F., Marban, A., Rischke, R. & Samek, W. Communication-efficient federated distillation. Preprint at (2020).
Sui, D. et al. FedED: federated learning via ensemble distillation for medical relation extraction. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (eds Webber, B. et al.) 2118–2128 (Association for Computational Linguistics, 2020).
Han, S. et al. FedX: unsupervised federated learning with cross knowledge distillation. In European Conference on Computer Vision. (eds Avidan, S. et al.) 691–707 (Springer Nature Switzerland, 2022).
Jeong, E. et al. Communication-efficient on-device machine learning: federated distillation and augmentation under non-IID private data. Preprint at (2023).
Choquette-Choo, C. A. et al. CaPC learning: confidential and private collaborative learning. Preprint at (2021).
PyGrid: a peer-to-peer platform for private data science and federated learning OpenMined Blog / (2020).
FLuID POC platform. GitHub (2023).
Hancox, J. C., McPate, M. J., El Harchi, A. & Zhang, Y. H. The hERG potassium channel and hERG screening for drug-induced torsades de pointes. Pharmacol. Ther. 119, 118–132 (2008).
Google Scholar
Wolford, B. What is GDPR, the EU’s new data protection ? GDPR.eu / (2018).
Shokri, R., Stronati, M., Song, C. & Shmatikov, V. Membership inference attacks against machine learning model. In IEEE Symposium on Security and Privacy (SP) 3–18 (IEEE, 2017).
Raipuria, G., Bonthu, S. & Singhal, N. Noise robust training of segmentation model using knowledge distillation. In Proc. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021 (eds Del Bimbo, A. et al.) 97–104 (Springer, 2021).
Matthews, B. W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta 405, 442–451 (1975).
Google Scholar
Bassani, D., Brigo, A. & Andrews-Morger, A. Federated learning in computational toxicology: an industrial perspective on the Effiris Hackathon. Chem. Res. Toxicol. 36, 1503–1517 (2023).
Google Scholar
Kim, S. et al. PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res. 49, D1388–D1395 (2021).
Google Scholar
Bajusz, D., Rácz, A. & Héberger, K. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. Cheminformatics 7, 20 (2015).
Google Scholar
Glen, R. et al. Circular fingerprints: flexible molecular descriptors with applications from physical chemistry to ADME. IDrugs Investig. Drugs J. 9, 199–204 (2006).
Google Scholar
Hudson, B. D., Hyde, R. M., Rahr, E., Wood, J. & Osman, J. Parameter based methods for compound selection from chemical databases. Quant. Struct. Act. Relatsh. 15, 285–289 (1996).
Google Scholar
Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).
Google Scholar
Maggiora, G., Vogt, M., Stumpfe, D. & Bajorath, J. Molecular similarity in medicinal chemistry. J. Med. Chem. 57, 3186–3204 (2014).
Google Scholar
Maggiora, G. M. Concepts and Applications of Molecular Similarity (eds Johnson, M. A. & Maggiora, G. M.) (John Wiley & Sons, 1990).
Pan, S. J. & Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 1345–1359 (2010).
Google Scholar
Gaulton, A. et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40, D1100–D1107 (2012).
Google Scholar
MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proc. Fifth Berkeley Symp. Math. Stat. Probab. (eds Marie Le Cam, L. & Neyman, J.) Vol. 1, 281–298 (1967).
Siramshetty, V. B., Chen, Q., Devarakonda, P. & Preissner, R. The catch-22 of predicting hERG blockade using publicly accessible bioactivity data. J. Chem. Inf. Model. 58, 1224–1233 (2018).
Google Scholar
Ho, T. K. Random decision forests. In Proc. 3rd International Conference on Document Analysis and Recognition Vol. 1, 278–282 (1995).
Hanser, T., Barber, C., Marchaland, J. F. & Werner, S. Applicability domain: towards a more formal definition. SAR QSAR Environ. Res. (2016) .
Hanser, T. et al. Self organising hypothesis networks: a new approach for representing and structuring SAR knowledge. J. Cheminformatics 6, 21 (2014).
Google Scholar
Carhart, R., Smith, D. H. & Venkataraghavan, R. Atom pairs as molecular features in structure-activity studies: definition and applications. J. Chem. Inf. Comput. Sci. (1985).
Hanser, T., Steinmetz, F. P., Plante, J., Rippmann, F. & Krier, M. Avoiding hERG-liability in drug design via synergetic combinations of different (Q)SAR methodologies and data sources: a case study in an industrial setting. J. Cheminformatics 11, 9 (2019).
Google Scholar
Hanser, T., Werner, S. & Plante, J. FLuID POC a simulation platform for federated distillation. Zenodo (2024).
van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Google Scholar
2025-03-05 00:00:00