Machine learning prediction of enzyme optimum pH

Barroca, M. et al. Deciphering the factors defining the pH-dependence of a commercial glycoside hydrolase family 8 enzyme. Enzyme Microb. Technol. 96, 163–169 (2017).
Google Scholar
Reed, C. J., Lewis, H., Trejo, E., Winston, V. & Evilia, C. Protein adaptations in Archaeal extremophiles. Archaea 2013, 373275 (2013).
Google Scholar
Protze, J. et al. An extracellular tetrathionate hydrolase from the thermoacidophilic archaeon Acidianus ambivalens with an activity optimum at pH 1. Front. Microbiol. 2, 68 (2011).
Google Scholar
Pradeep, G. C. et al. An extremely alkaline novel chitinase from Streptomyces sp. CS495. Process Biochem. 49, 223–229 (2014).
Google Scholar
Ferrer, M., Golyshina, O., Beloqui, A. & Golyshin, P. N. Mining enzymes from extreme environments. Curr. Opin. Microbiol. 10, 207–214 (2007).
Google Scholar
Thomas, N. et al. Engineering of highly active and diverse nuclease enzymes by combining machine learning and ultra-high-throughput screening. Preprint at bioRxiv https://doi.org/10.1101/2024.03.21.585615 (2024).
Verma, D. & Satyanarayana, T. Xylanolytic extremozymes retrieved from environmental metagenomes: characteristics, genetic engineering, and applications. Front. Microbiol. 11, 551109 (2020).
Google Scholar
Shahraki, M. F. et al. A computational learning paradigm to targeted discovery of biocatalysts from metagenomic data: a case study of lipase identification. Biotechnol. Bioeng. 119, 1115–1128 (2022).
Google Scholar
Erickson, E. et al. Sourcing thermotolerant poly(ethylene terephthalate) hydrolase scaffolds from natural diversity. Nat. Commun. 13, 7850 (2022).
Google Scholar
Wang, C.-H., Liu, X.-L., Huang, R.-B., He, B.-F. & Zhao, M.-M. Enhanced acidic adaptation of Bacillus subtilis Ca-independent alpha-amylase by rational engineering of pKa values. Biochem. Eng. J. 139, 146–153 (2018).
Google Scholar
dos Santos, J. P., da Rosa Zavareze, E., Dias, A. R. G. & Vanier, N. L. Immobilization of xylanase and xylanase-β-cyclodextrin complex in polyvinyl alcohol via electrospinning improves enzyme activity at a wide pH and temperature range. Int. J. Biol. Macromol. 118, 1676–1684 (2018).
Google Scholar
Giri, P., Pagar, A. D., Patil, M. D. & Yun, H. Chemical modification of enzymes to improve biocatalytic performance. Biotechnol. Adv. 53, 107868 (2021).
Google Scholar
Xue, Y. et al. Chemical modification of stem bromelain with anhydride groups to enhance its stability and catalytic activity. J. Mol. Catal. B 63, 188–193 (2010).
Google Scholar
Li, C. Effects of chemical modification by chitooligosaccharide on enzyme activity and stability of yeast β-d-fructofuranosidase. Enzyme Microb. Technol. 64–65, 24–32 (2014).
Google Scholar
Li, S.-F., Cheng, F., Wang, Y.-J. & Zheng, Y.-G. Strategies for tailoring pH performances of glycoside hydrolases. Crit. Rev. Biotechnol. 43, 121–141 (2023).
Google Scholar
Shi, X., Wu, D., Xu, Y. & Yu, X. Engineering the optimum pH of β-galactosidase from Aspergillus oryzae for efficient hydrolysis of lactose. J. Dairy Sci. 105, 4772–4782 (2022).
Google Scholar
Hebditch, M. & Warwicker, J. Web-based display of protein surface and pH-dependent properties for assessing the developability of biotherapeutics. Sci. Rep. 9, 1969 (2019).
Google Scholar
Schmitz, M. et al. patcHwork: a user-friendly pH sensitivity analysis web server for protein sequences and structures. Nucleic Acids Res. 50, W560–W567 (2022).
Google Scholar
Oeller, M. et al. Sequence-based prediction of pH-dependent protein solubility using CamSol. Brief. Bioinform. 24, bbad004 (2023).
Google Scholar
Zhang, G., Li, H. & Fang, B. Discriminating acidic and alkaline enzymes using a random forest model with secondary structure amino acid composition. Process Biochem. 44, 654–660 (2009).
Google Scholar
Lin, H., Chen, W. & Ding, H. AcalPred: a sequence-based tool for discriminating between acidic and alkaline enzymes. PLoS ONE 8, e75726 (2013).
Google Scholar
Fan, G.-L., Li, Q.-Z. & Zuo, Y.-C. Predicting acidic and alkaline enzymes by incorporating the average chemical shift and gene ontology informations into the general form of Chou’s PseAAC. Process Biochem. 48, 1048–1053 (2013).
Google Scholar
Khan, Z. U., Hayat, M. & Khan, M. A. Discrimination of acidic and alkaline enzyme using Chou’s pseudo amino acid composition in conjunction with probabilistic neural network model. J. Theor. Biol. 365, 197–203 (2015).
Google Scholar
Yan, S. & Wu, G. Predicting pH optimum for activity of beta-glucosidases. J. Biomed. Sci. Eng. 12, 354–367 (2019).
Google Scholar
Wang, X., Li, H., Gao, P., Liu, Y. & Zeng, W. Combining support vector machine with dual g-gap dipeptides to discriminate between acidic and alkaline enzymes. Lett. Org. Chem. 16, 325–331 (2019).
Google Scholar
Li, X. et al. A sequence embedding method for enzyme optimal condition analysis. BMC Bioinform. 21, 512 (2020).
Google Scholar
Schomburg, I. et al. The BRENDA enzyme information system—from a database to an expert system. J. Biotechnol. 261, 194–206 (2017).
Google Scholar
Puissant, J. et al. The pH optimum of soil exoenzymes adapt to long term changes in soil pH. Soil Biol. Biochem. 138, 107601 (2019).
Google Scholar
Li, G. et al. Learning deep representations of enzyme thermal adaptation. Protein Sci. 31, e4480 (2022).
Google Scholar
Reimer, L. C. et al. BacDive in 2022: the knowledge base for standardized bacterial and archaeal data. Nucleic Acids Res. 50, D741–D746 (2022).
Google Scholar
Sayers, E. W. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 49, D10–D17 (2021).
Google Scholar
Teufel, F. et al. SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat. Biotechnol. 40, 1023–1025 (2022).
Google Scholar
Booth, I. R. Regulation of cytoplasmic pH in bacteria. Microbiol. Rev. 49, 359–378 (1985).
Google Scholar
Baker-Austin, C. & Dopson, M. Life in acid: pH homeostasis in acidophiles. Trends Microbiol. 15, 165–171 (2007).
Google Scholar
Hough, D. W. & Danson, M. J. Extremozymes. Curr. Opin. Chem. Biol. 3, 39–46 (1999).
Google Scholar
Tan, C. et al. A survey on deep transfer learning. In Artificial Neural Networks and Machine Learning–ICANN 2018: 27th International Conference on Artificial Neural Networks (eds Kůrková, V. et al.) 270–279 (Springer, 2018).
Gado, J. E., Beckham, G. T. & Payne, C. M. Improving enzyme optimum temperature prediction with resampling strategies and ensemble learning. J. Chem. Inf. Model. 60, 4098–4107 (2020).
Google Scholar
Branco, P., Torgo, L. & Ribeiro, R. P. Pre-processing approaches for imbalanced distributions in regression. Neurocomputing 343, 76–99 (2019).
Google Scholar
Yang, Y., Zha, K., Chen, Y.-C., Wang, H. & Katabi, D. Delving into deep imbalanced regression. In Proc. 38th International Conference on Machine Learning (eds Meila, M. and Zhang, T.) 11842–11851 (PMLR, 2021).
Chen, Z. et al. iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics 34, 2499–2502 (2018).
Google Scholar
Unsal, S. et al. Learning functional properties of proteins with language models. Nat. Mach. Intell. 4, 227–245 (2022).
Google Scholar
Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl Acad. Sci. USA 118, e2016239118 (2021).
Google Scholar
Meier, J. et al. Language models enable zero-shot prediction of the effects of mutations on protein function. In Advances in Neural Information Processing Systems (eds Ranzato, M. et al.) 29287–29303 (Curran Associates, 2021).
Elnaggar, A. et al. ProtTrans: toward understanding the language of life through self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7112–7127 (2022).
Google Scholar
Notin, P. et al. Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval. In Proc. 39th International Conference on Machine Learning (eds Chaudhuri, K. et al.) 16990–17017 (PMLR, 2022).
Nijkamp, E., Ruffolo, J., Weinstein, E. N., Naik, N. & Madani, A. ProGen2: exploring the boundaries of protein language models. Cell Syst. 14, 968–978.e3 (2023).
Google Scholar
Yang, K. K., Lu, A. X. & Fusi, N. Convolutions are competitive with transformers for protein sequence pretraining. Cell Syst. 15, 286–294 (2022).
Stärk, H., Dallago, C., Heinzinger, M. & Rost, B. Light attention predicts protein location from the language of life. Bioinform. Adv. 1, vbab035 (2021).
Google Scholar
Bergstra, J. & Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012).
Google Scholar
Li, G. et al. Performance of regression models as a function of experiment noise. Bioinform. Biol. Insights 15, 11779322211020315 (2021).
Google Scholar
Detlefsen, N. S., Hauberg, S. & Boomsma, W. Learning meaningful representations of protein sequences. Nat. Commun. 13, 1914 (2022).
Google Scholar
Kroll, A. & Lercher, M. J. Machine learning models for the prediction of enzyme properties should be tested on proteins not used for model training. Preprint at bioRxiv https://doi.org/10.1101/2023.02.06.526991 (2023).
Suplatov, D. et al. Computational design of a pH stable enzyme: understanding molecular mechanism of penicillin acylase’s adaptation to alkaline conditions. PLoS ONE 9, e100643 (2014).
Google Scholar
Huang, Y., Krauss, G., Cottaz, S., Driguez, H. & Lipps, G. A highly acid-stable and thermostable endo-β-glucanase from the thermoacidophilic archaeon Sulfolobus solfataricus. Biochem. J. 385, 581–588 (2005).
Google Scholar
Mamo, G., Thunnissen, M., Hatti-Kaul, R. & Mattiasson, B. An alkaline active xylanase: insights into mechanisms of high pH catalytic adaptation. Biochimie 91, 1187–1196 (2009).
Google Scholar
Wang, Y., Xu, M., Yang, T., Zhang, X. & Rao, Z. Surface charge-based rational design of aspartase modifies the optimal pH for efficient β-aminobutyric acid production. Int. J. Biol. Macromol. 164, 4165–4172 (2020).
Google Scholar
Jakob, F. et al. Surface charge engineering of a Bacillus gibsonii subtilisin protease. Appl. Microbiol. Biotechnol. 97, 6793–6802 (2013).
Google Scholar
Yang, T. et al. N20D/N116E combined mutant downward shifted the pH optimum of Bacillus subtilis NADH oxidase. Biology 12, 522 (2023).
Google Scholar
Masui, A., Fujiwara, N., Yamamoto, K., Takagi, M. & Imanaka, T. Rational design for stabilization and optimum pH shift of serine protease AprN. J. Ferment. Bioeng. 85, 30–36 (1998).
Google Scholar
Turunen, O., Vuorio, M., Fenel, F. & Leisola, M. Engineering of multiple arginines into the Ser/Thr surface of Trichoderma reesei endo-1,4-β-xylanase II increases the thermotolerance and shifts the pH optimum towards alkaline pH. Protein Eng. 15, 141–145 (2002).
Google Scholar
Li, Q., Jiang, T., Liu, R., Feng, X. & Li, C. Tuning the pH profile of β-glucuronidase by rational site-directed mutagenesis for efficient transformation of glycyrrhizin. Appl. Microbiol. Biotechnol. 103, 4813–4823 (2019).
Google Scholar
Pokhrel, S., Joo, J. C. & Yoo, Y. J. Shifting the optimum pH of Bacillus circulans xylanase towards acidic side by introducing arginine. Biotechnol. Bioprocess Eng. 18, 35–42 (2013).
Google Scholar
Carvalho, D. V., Pereira, E. M. & Cardoso, J. S. Machine learning interpretability: a survey on methods and metrics. Electronics 8, 832 (2019).
Google Scholar
Olsson, M. H. M., Søndergaard, C. R., Rostkowski, M. & Jensen, J. H. PROPKA3: consistent treatment of internal and surface residues in empirical pKa predictions. J. Chem. Theory Comput. 7, 525–537 (2011).
Google Scholar
Talley, K. & Alexov, E. On the pH-optimum of activity and stability of proteins. Proteins 78, 2699–2706 (2010).
Google Scholar
Alexov, E. Numerical calculations of the pH of maximal protein stability. The effect of the sequence composition and three-dimensional structure. Eur. J. Biochem. 271, 173–185 (2004).
Google Scholar
Yu, T. et al. Enzyme function prediction using contrastive learning. Science 379, 1358–1363 (2023).
Google Scholar
Pak, M. A., Dovidchenko, N. V., Sharma, S. M. & Ivankov, D. N. New mega dataset combined with deep neural network makes a progress in predicting impact of mutation on protein stability. Preprint at bioRxiv https://doi.org/10.1101/2022.12.31.522396 (2023).
Kroll, A., Rousset, Y., Hu, X.-P., Liebrand, N. A. & Lercher, M. J. Turnover number predictions for kinetically uncharacterized enzymes using machine and deep learning. Nat. Commun. 14, 4139 (2023).
Google Scholar
Gelman, S., Fahlberg, S. A., Heinzelman, P., Romero, P. A. & Gitter, A. Neural networks to learn protein sequence–function relationships from deep mutational scanning data. Proc. Natl Acad. Sci. USA 118, e2104878118 (2021).
Google Scholar
Dallago, C. et al. FLIP: benchmark tasks in fitness landscape inference for proteins. Preprint at bioRxiv https://doi.org/10.1101/2021.11.09.467890 (2022).
Sledzieski, S. et al. Democratizing protein language models with parameter-efficient fine-tuning. Proc. Natl Acad. Sci. USA 121, e2405840121 (2024).
Google Scholar
Xu, M. et al. PEER: a comprehensive and multi-task benchmark for protein sequence understanding. Adv. Neural Inf. Process. Syst. 35, 35156–35173 (2022).
Ferdous, S., Shihab, I. F. & Reuel, N. F. Effects of sequence features on machine-learned enzyme classification fidelity. Biochem. Eng. J. 187, 108612 (2022).
Google Scholar
Li, G., Rabe, K. S., Nielsen, J. & Engqvist, M. K. M. Machine learning applied to predicting microorganism growth temperatures and enzyme catalytic optima. ACS Synth. Biol. 8, 1411–1420 (2019).
Google Scholar
Liu, H., HaoChen, J. Z., Gaidon, A. & Ma, T. Self-supervised learning is more robust to dataset imbalance. Preprint at https://arxiv.org/abs/2110.05025 (2022).
Zaretckii, M., Buslaev, P., Kozlovskii, I., Morozov, A. & Popov, P. Approaching optimal pH enzyme prediction with large language models. ACS Synth. Biol. https://doi.org/10.1021/acssynbio.4c00465 (2024).
Song, Y. et al. Accurately predicting enzyme functions through geometric graph learning on ESMFold-predicted structures. Nat. Commun. 15, 8180 (2024).
Google Scholar
Richardson, L. et al. MGnify: the microbiome sequence data analysis resource in 2023. Nucleic Acids Res. 51, D753–D759 (2023).
Google Scholar
UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 51, D523–D531 (2023).
Google Scholar
Hopf, T. A. et al. Mutation effects predicted from sequence co-variation. Nat. Biotechnol. 35, 128–135 (2017).
Google Scholar
Hie, B. L. & Yang, K. K. Adaptive machine learning for protein engineering. Curr. Opin. Struct. Biol. 72, 145–152 (2022).
Google Scholar
Yang, K. K., Wu, Z. & Arnold, F. H. Machine-learning-guided directed evolution for protein engineering. Nat. Methods 16, 687–694 (2019).
Google Scholar
Hawkins-Hooker, A. et al. Generating functional protein variants with variational autoencoders. PLoS Comput. Biol. 17, e1008736 (2021).
Google Scholar
Strokach, A. & Kim, P. M. Deep generative modeling for protein design. Curr. Opin. Struct. Biol. 72, 226–236 (2022).
Google Scholar
Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).
Google Scholar
Cock, P. J. A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
Google Scholar
Cui, Y., Jia, M., Lin, T.-Y., Song, Y. & Belongie, S. Proc. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2019).
Riesselman, A. J., Ingraham, J. B. & Marks, D. S. Deep generative models of genetic variation capture the effects of mutations. Nat. Methods 15, 816–822 (2018).
Google Scholar
Guyon, I., Weston, J., Barnhill, S. & Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002).
Google Scholar
Shin, J.-E. et al. Protein design and variant prediction using autoregressive generative models. Nat. Commun. 12, 2403 (2021).
Google Scholar
van den Oord, A. et al. WaveNet: a generative model for raw audio. Preprint at https://arxiv.org/abs/1609.03499 (2016).
Cho, K. et al. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proc. 2014 Conference on Empirical Methods in Natural Language Processing (eds Moschitti, A. et al.) 1724–1734 (Association for Computational Linguistics, 2014); https://doi.org/10.3115/v1/D14-1179
Pascanu, R., Mikolov, T. & Bengio, Y. On the difficulty of training recurrent neural networks. In Proc. 30th International Conference on Machine Learning (eds Dasgupta, S. & McAllester, D.) 1310–1318 (PMLR, 2013).
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
Google Scholar
Alley, E. C., Khimulya, G., Biswas, S., AlQuraishi, M. & Church, G. M. Unified rational protein engineering with sequence-based deep representation learning. Nat. Methods 16, 1315–1322 (2019).
Google Scholar
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Google Scholar
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems (eds Wallach, H. et al.) 8024–8035 (Curran Associates, 2019).
Varadi, M. et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50, D439–D444 (2022).
Google Scholar
Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).
Google Scholar
Shrake, A. & Rupley, J. A. Environment and exposure to solvent of protein atoms. Lysozyme and insulin. J. Mol. Biol. 79, 351–371 (1973).
Google Scholar
Rost, B. & Sander, C. Conservation and prediction of solvent accessibility in protein families. Proteins 20, 216–226 (1994).
Google Scholar
Savojardo, C., Manfredi, M., Martelli, P. L. & Casadio, R. Solvent accessibility of residues undergoing pathogenic variations in humans: from protein structures to protein sequences. Front. Mol. Biosci. 7, 626363 (2021).
Google Scholar
Gado, J. E. et al. Machine learning prediction of enzyme optimal pH. Zenodo https://doi.org/10.5281/ZENODO.14252615 (2023).
Gado, J. jafetgado/EpHod: v1.0.0. Zenodo https://doi.org/10.5281/ZENODO.15015125 (2025).
Austin, H. P. et al. Characterization and engineering of a plastic-degrading aromatic polyesterase. Proc. Natl Acad. Sci. USA 115, E4350–E4357 (2018).
Google Scholar
Don’t miss more hot News like this! Click here to discover the latest in AI news!
2025-04-29 00:00:00