Deep Learning-based Prediction Of The Selection Factors For Quantifying Selection In Immune Receptor Repertoires

Rocha, B. & von Boehmer, H. Peripheral selection of the T cell repertoire. Science 251, 1225–1228 (1991).

Google Scholar

Egerton, M., Scollay, R. & Shortman, K. Kinetics of mature T-cell development in the thymus. Proc. Natl Acad. Sci. USA 87, 2579–2582 (1990).

Google Scholar

Janeway, C. et al. Immunobiology: The Immune System in Health and Disease Vol. 2 (Garland, 2001).

Roth, D. B. V(D)J recombination: mechanism, errors, and fidelity. In Mobile DNA III 311–324 (American Society for Microbiology, 2015).

Sethna, Z. et al. Population variability in the generation and selection of T-cell repertoires. PLOS Comput. Biol. 16, e1008394 (2020).

Google Scholar

Isacchini, G., Walczak, A. M., Mora, T. & Nourmohammad, A. Deep generative selection models of T and B cell receptor repertoires with SONIA. Proc. Natl Acad. Sci. USA 118, e2024364118 (2021).

Pai, J. A. & Satpathy, A. T. High-throughput and single-cell T cell receptor sequencing technologies. Nat. Methods 18, 881–892 (2021).

Google Scholar

Murugan, A., Mora, T., Walczak, A. M. & Callan Jr, C. G. Statistical inference of the generation probability of T-cell receptors from sequence repertoires. Proc. Natl Acad. Sci. USA 109, 16161–16166 (2012).

Google Scholar

Marcou, Q., Mora, T. & Walczak, A. M. High-throughput immune repertoire analysis with IGoR. Nat. Commun. 9, 561 (2018).

Google Scholar

Sethna, Z., Elhanati, Y., Callan Jr, C. G., Walczak, A. M. & Mora, T. OLGA: fast computation of generation probabilities of B- and T-cell receptor amino acid sequences and motifs. Bioinformatics 35, 2974–2981 (2019).

Google Scholar

Elhanati, Y., Murugan, A., Callan Jr, C. G., Mora, T. & Walczak, A. M. Quantifying selection in immune receptor repertoires. Proc. Natl Acad. Sci. USA 111, 9875–9880 (2014).

Google Scholar

Jiang, Y. & Li, S. C. Deep autoregressive generative models capture the intrinsics embedded in T-cell receptor repertoires. Brief. Bioinform. 24, bbad038 (2023).

Google Scholar

Davidsen, K. et al. Deep generative models for T cell receptor protein sequences. eLife 8, e46935 (2019).

Google Scholar

Zhou, Z.-H. A brief introduction to weakly supervised learning. Natl Sci. Rev. 5, 44–53 (2018).

Google Scholar

Li, Y.-F., Guo, L.-Z. & Zhou, Z.-H. Towards safe weakly supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 43, 334–346 (2019).

Google Scholar

Hou, X. et al. Analysis of the repertoire features of TCR beta chain CDR3 in human by high-throughput sequencing. Cell. Physiol. Biochem. 39, 651–667 (2016).

Google Scholar

Das, J. & Jayaprakash, C. Systems Immunology: An Introduction to Modeling Methods for Scientists (CRC, 2018).

Rao, R. et al. Evaluating protein transfer learning with TAPE. Adv. Neural. Inf. Process. Syst. 32, 9689–9701 (2019).

Google Scholar

Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Vol. 1 (eds Burstein, J. et al.) 4171–4186 (Association for Computational Linguistics, 2019).

Huang, H., Wang, C., Rubelt, F., Scriba, T. J. & Davis, M. M. Analyzing the Mycobacterium tuberculosis immune response by T-cell receptor clustering with GLIPH2 and genome-wide antigen screening. Nat. Biotechnol. 38, 1194–1202 (2020).

Google Scholar

Dash, P. et al. Quantifiable predictive features define epitope-specific T cell receptor repertoires. Nature 547, 89–93 (2017).

Google Scholar

Chen, S.-Y., Yue, T., Lei, Q. & Guo, A.-Y. TCRdb: a comprehensive database for T-cell receptor sequences with powerful search function. Nucleic Acids Res. 49, D468–D474 (2021).

Google Scholar

Emerson, R. O. et al. Immunosequencing identifies signatures of cytomegalovirus exposure history and HLA-mediated effects on the T cell repertoire. Nat. Genet. 49, 659–665 (2017).

Google Scholar

Mikolov, T., Chen, K., Corrado, G. & Dean, J. Efficient estimation of word representations in vector space. Preprint at https://arxiv.org/abs/1301.3781 (2013).

Elhanati, Y., Sethna, Z., Callan Jr, C. G., Mora, T. & Walczak, A. M. Predicting the spectrum of TCR repertoire sharing with a data-driven model of recombination. Immunol. Rev. 284, 167–179 (2018).

Google Scholar

Ruiz Ortega, M., Spisak, N., Mora, T. & Walczak, A. M. Modeling and predicting the overlap of B- and T-cell receptor repertoires in healthy and SARS-CoV-2 infected individuals. PLoS Genet. 19, e1010652 (2023).

Google Scholar

Nolan, S. et al. A large-scale database of T-cell receptor beta sequences and binding associations from natural and synthetic exposure to SARS-CoV-2. Front. Immunol 16, 1488851 (2025).

Google Scholar

Weyand, C. M. & Goronzy, J. Aging of the immune system. Mechanisms and therapeutic targets. Ann. Am. Thorac. Soc. 13, S422–S428 (2016).

Google Scholar

Egorov, E. S. et al. The changing landscape of naive T cell receptor repertoire with human aging. Front. Immunol. 9, 1618 (2018).

Google Scholar

Britanova, O. V. et al. Age-related decrease in TCR repertoire diversity measured with deep and normalized sequence profiling. J. Immunol. 192, 2689–2698 (2014).

Google Scholar

Palmer, D. B. The effect of age on thymic function. Front. Immunol. 4, 316 (2013).

Google Scholar

Krishna, C., Chowell, D., Gönen, M., Elhanati, Y. & Chan, T. A. Genetic and environmental determinants of human TCR repertoire diversity. Immun. Ageing 17, 26 (2020).

Google Scholar

Wang, G. C., Dash, P., McCullers, J. A., Doherty, P. C. & Thomas, P. G. T cell receptor αβ diversity inversely correlates with pathogen-specific antibody levels in human cytomegalovirus infection. Sci. Transl. Med. 4, 128ra42 (2012).

Google Scholar

Jergović, M., Contreras, N. A. & Nikolich-Žugich, J. Impact of CMV upon immune aging: facts and fiction. Med. Microbiol. Immunol. 208, 263–269 (2019).

Google Scholar

Chu, N. D. et al. Longitudinal immunosequencing in healthy people reveals persistent T cell receptors rich in highly public receptors. BMC Immunol. 20, 19 (2019).

Google Scholar

Britanova, O. V. et al. Dynamics of individual T cell repertoires: from cord blood to centenarians. J. Immunol. 196, 5005–5013 (2016).

Google Scholar

Bensouda Koraichi, M., Ferri, S., Walczak, A. M. & Mora, T. Inferring the T cell repertoire dynamics of healthy individuals. Proc. Natl Acad. Sci. USA 120, e2207516120 (2023).

Google Scholar

Pogorelyy, M. V. et al. Method for identification of condition-associated public antigen receptor sequences. eLife 7, e33050 (2018).

Google Scholar

Widrich, M. et al. Modern Hopfield networks and attention for immune repertoire classification. Adv. Neural Inf. Process. Syst. 33, 18832–18845 (2020).

Google Scholar

Akerman, O., Isakov, H., Levi, R., Psevkin, V. & Louzoun, Y. Counting is almost all you need. Front. Immunol. 13, 1031011 (2023).

Google Scholar

Shugay, M. et al. VDJdb: a curated database of T-cell receptor sequences with known antigen specificity. Nucleic Acids Res. 46, D419–D427 (2018).

Google Scholar

Joglekar, A. V. & Li, G. T cell antigen discovery. Nat. Methods 18, 873–880 (2021).

Google Scholar

Hudson, D., Fernandes, R. A., Basham, M., Ogg, G. & Koohy, H. Can we predict T cell specificity with digital biology and machine learning? Nat. Rev. Immunol. 23, 511–521 (2023).

Gomez-Tourino, I., Kamra, Y., Baptista, R., Lorenc, A. & Peakman, M. T cell receptor β-chains display abnormal shortening and repertoire sharing in type 1 diabetes. Nat. Commun. 8, 1792 (2017).

Google Scholar

Savola, P. et al. Somatic mutations in clonally expanded cytotoxic T lymphocytes in patients with newly diagnosed rheumatoid arthritis. Nat. Commun. 8, 15869 (2017).

Google Scholar

Huth, A., Liang, X., Krebs, S., Blum, H. & Moosmann, A. Antigen-specific TCR signatures of cytomegalovirus infection. J. Immunol. 202, 979–990 (2019).

Google Scholar

Faham, M. et al. Discovery of T cell receptor β motifs specific to HLA–B27–positive ankylosing spondylitis by deep repertoire sequence analysis. Arthritis Rheumatol. 69, 774–784 (2017).

Google Scholar

Zhao, Y. et al. Preferential use of public TCR during autoimmune encephalomyelitis. J. Immunol. 196, 4905–4914 (2016).

Google Scholar

Lu, C. et al. Clinical significance of T cell receptor repertoire in primary Sjogren’s syndrome. EBioMedicine 84, 104252 (2022).

Seder, R. A., Darrah, P. A. & Roederer, M. T-cell quality in memory and protection: implications for vaccine design. Nat. Rev. Immunol. 8, 247–258 (2008).

Google Scholar

Feng, X. et al. A comprehensive benchmarking for evaluating TCR embeddings in modeling TCR-epitope interactions. Brief. Bioinform. 26, bbaf030 (2025).

Google Scholar

Kobyzev, I., Prince, SimonJ. D. & Brubaker, M. A. Normalizing flows: an introduction and review of current methods. IEEE Trans. Pattern Anal. Mach. Intell. 43, 3964–3979 (2020).

Google Scholar

Zhang, W., Gou, Y., Jiang, Y. & Zhang, Y. Adversarial VAE with normalizing flows for multi-dimensional classification. In Chinese Conference on Pattern Recognition and Computer Vision (PRCV) 205–219 (Springer, 2022).

Tino, P., Leonardis, Y., Leonardis, A. & Tang, K. A survey on neural network interpretability. IEEE Trans. Emerg. Top. Comput. Intell. 5, 726–742 (2021).

Google Scholar

Jiang, Y., Huo, M., Zhang, P., Zou, Y. & Li, S. TCR2vec: a deep representation learning framework of T-cell receptor sequence and function. Preprint at bioRxiv https://doi.org/10.1101/2023.03.31.535142 (2023).

Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 5998–6008 (2017).

Jiang, Y., Rensi, S., Wang, S. & Altman, R. B. DrugOrchestra: jointly predicting drug response, targets, and side effects via deep multi-task learning. Preprint at biorxiv https://doi.org/10.1101/2020.11.17.385757 (2020).

Dai, Z. et al. BEV-Net: assessing social distancing compliance by joint people localization and geometric reasoning. In Proc. IEEE/CVF International Conference on Computer Vision 5401–5411 (IEEE, 2021).

Yamada, M., Suzuki, T., Kanamori, T., Hachiya, H. & Sugiyama, M. Relative density-ratio estimation for robust distribution comparison. Neural Comput. 25, 1324–1370 (2013).

MathSciNet

Google Scholar

Kingma, D. & Ba, J. Adam: a method for stochastic optimization. In Proc. International Conference on Learning Representations (ICLR, 2015)

Slabodkin, A. et al. Individualized VDJ recombination predisposes the available IG sequence space. Genome Res. 31, 2209–2224 (2021).

Google Scholar

Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

MathSciNet

Google Scholar

Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).

Google Scholar

Jiang, Y. Tcrsep. Zenodo https://doi.org/10.5281/zenodo.15691314 (2025).TS: In panel a, please change the x axis labels 0.5, 0.6 and 0.7 to 0.50, 0.60 and 0.70, respectively.