AI

Machine learning prediction of enzyme optimum pH

  • Barroca, M. et al. Deciphering the factors defining the pH-dependence of a commercial glycoside hydrolase family 8 enzyme. Enzyme Microb. Technol. 96, 163–169 (2017).

    Article 

    Google Scholar 

  • Reed, C. J., Lewis, H., Trejo, E., Winston, V. & Evilia, C. Protein adaptations in Archaeal extremophiles. Archaea 2013, 373275 (2013).

    Article 

    Google Scholar 

  • Protze, J. et al. An extracellular tetrathionate hydrolase from the thermoacidophilic archaeon Acidianus ambivalens with an activity optimum at pH 1. Front. Microbiol. 2, 68 (2011).

    Article 

    Google Scholar 

  • Pradeep, G. C. et al. An extremely alkaline novel chitinase from Streptomyces sp. CS495. Process Biochem. 49, 223–229 (2014).

    Article 

    Google Scholar 

  • Ferrer, M., Golyshina, O., Beloqui, A. & Golyshin, P. N. Mining enzymes from extreme environments. Curr. Opin. Microbiol. 10, 207–214 (2007).

    Article 

    Google Scholar 

  • Thomas, N. et al. Engineering of highly active and diverse nuclease enzymes by combining machine learning and ultra-high-throughput screening. Preprint at bioRxiv https://doi.org/10.1101/2024.03.21.585615 (2024).

  • Verma, D. & Satyanarayana, T. Xylanolytic extremozymes retrieved from environmental metagenomes: characteristics, genetic engineering, and applications. Front. Microbiol. 11, 551109 (2020).

    Article 

    Google Scholar 

  • Shahraki, M. F. et al. A computational learning paradigm to targeted discovery of biocatalysts from metagenomic data: a case study of lipase identification. Biotechnol. Bioeng. 119, 1115–1128 (2022).

    Article 

    Google Scholar 

  • Erickson, E. et al. Sourcing thermotolerant poly(ethylene terephthalate) hydrolase scaffolds from natural diversity. Nat. Commun. 13, 7850 (2022).

    Article 

    Google Scholar 

  • Wang, C.-H., Liu, X.-L., Huang, R.-B., He, B.-F. & Zhao, M.-M. Enhanced acidic adaptation of Bacillus subtilis Ca-independent alpha-amylase by rational engineering of pKa values. Biochem. Eng. J. 139, 146–153 (2018).

    Article 

    Google Scholar 

  • dos Santos, J. P., da Rosa Zavareze, E., Dias, A. R. G. & Vanier, N. L. Immobilization of xylanase and xylanase-β-cyclodextrin complex in polyvinyl alcohol via electrospinning improves enzyme activity at a wide pH and temperature range. Int. J. Biol. Macromol. 118, 1676–1684 (2018).

    Article 

    Google Scholar 

  • Giri, P., Pagar, A. D., Patil, M. D. & Yun, H. Chemical modification of enzymes to improve biocatalytic performance. Biotechnol. Adv. 53, 107868 (2021).

    Article 

    Google Scholar 

  • Xue, Y. et al. Chemical modification of stem bromelain with anhydride groups to enhance its stability and catalytic activity. J. Mol. Catal. B 63, 188–193 (2010).

    Article 

    Google Scholar 

  • Li, C. Effects of chemical modification by chitooligosaccharide on enzyme activity and stability of yeast β-d-fructofuranosidase. Enzyme Microb. Technol. 64–65, 24–32 (2014).

    Article 

    Google Scholar 

  • Li, S.-F., Cheng, F., Wang, Y.-J. & Zheng, Y.-G. Strategies for tailoring pH performances of glycoside hydrolases. Crit. Rev. Biotechnol. 43, 121–141 (2023).

    Article 

    Google Scholar 

  • Shi, X., Wu, D., Xu, Y. & Yu, X. Engineering the optimum pH of β-galactosidase from Aspergillus oryzae for efficient hydrolysis of lactose. J. Dairy Sci. 105, 4772–4782 (2022).

    Article 

    Google Scholar 

  • Hebditch, M. & Warwicker, J. Web-based display of protein surface and pH-dependent properties for assessing the developability of biotherapeutics. Sci. Rep. 9, 1969 (2019).

    Article 

    Google Scholar 

  • Schmitz, M. et al. patcHwork: a user-friendly pH sensitivity analysis web server for protein sequences and structures. Nucleic Acids Res. 50, W560–W567 (2022).

    Article 

    Google Scholar 

  • Oeller, M. et al. Sequence-based prediction of pH-dependent protein solubility using CamSol. Brief. Bioinform. 24, bbad004 (2023).

    Article 

    Google Scholar 

  • Zhang, G., Li, H. & Fang, B. Discriminating acidic and alkaline enzymes using a random forest model with secondary structure amino acid composition. Process Biochem. 44, 654–660 (2009).

    Article 

    Google Scholar 

  • Lin, H., Chen, W. & Ding, H. AcalPred: a sequence-based tool for discriminating between acidic and alkaline enzymes. PLoS ONE 8, e75726 (2013).

    Article 

    Google Scholar 

  • Fan, G.-L., Li, Q.-Z. & Zuo, Y.-C. Predicting acidic and alkaline enzymes by incorporating the average chemical shift and gene ontology informations into the general form of Chou’s PseAAC. Process Biochem. 48, 1048–1053 (2013).

    Article 

    Google Scholar 

  • Khan, Z. U., Hayat, M. & Khan, M. A. Discrimination of acidic and alkaline enzyme using Chou’s pseudo amino acid composition in conjunction with probabilistic neural network model. J. Theor. Biol. 365, 197–203 (2015).

    Article 
    MathSciNet 

    Google Scholar 

  • Yan, S. & Wu, G. Predicting pH optimum for activity of beta-glucosidases. J. Biomed. Sci. Eng. 12, 354–367 (2019).

    Article 

    Google Scholar 

  • Wang, X., Li, H., Gao, P., Liu, Y. & Zeng, W. Combining support vector machine with dual g-gap dipeptides to discriminate between acidic and alkaline enzymes. Lett. Org. Chem. 16, 325–331 (2019).

    Article 

    Google Scholar 

  • Li, X. et al. A sequence embedding method for enzyme optimal condition analysis. BMC Bioinform. 21, 512 (2020).

    Article 

    Google Scholar 

  • Schomburg, I. et al. The BRENDA enzyme information system—from a database to an expert system. J. Biotechnol. 261, 194–206 (2017).

    Article 

    Google Scholar 

  • Puissant, J. et al. The pH optimum of soil exoenzymes adapt to long term changes in soil pH. Soil Biol. Biochem. 138, 107601 (2019).

    Article 

    Google Scholar 

  • Li, G. et al. Learning deep representations of enzyme thermal adaptation. Protein Sci. 31, e4480 (2022).

    Article 

    Google Scholar 

  • Reimer, L. C. et al. BacDive in 2022: the knowledge base for standardized bacterial and archaeal data. Nucleic Acids Res. 50, D741–D746 (2022).

    Article 

    Google Scholar 

  • Sayers, E. W. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 49, D10–D17 (2021).

    Article 

    Google Scholar 

  • Teufel, F. et al. SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat. Biotechnol. 40, 1023–1025 (2022).

    Article 

    Google Scholar 

  • Booth, I. R. Regulation of cytoplasmic pH in bacteria. Microbiol. Rev. 49, 359–378 (1985).

    Article 

    Google Scholar 

  • Baker-Austin, C. & Dopson, M. Life in acid: pH homeostasis in acidophiles. Trends Microbiol. 15, 165–171 (2007).

    Article 

    Google Scholar 

  • Hough, D. W. & Danson, M. J. Extremozymes. Curr. Opin. Chem. Biol. 3, 39–46 (1999).

    Article 

    Google Scholar 

  • Tan, C. et al. A survey on deep transfer learning. In Artificial Neural Networks and Machine Learning–ICANN 2018: 27th International Conference on Artificial Neural Networks (eds Kůrková, V. et al.) 270–279 (Springer, 2018).

  • Gado, J. E., Beckham, G. T. & Payne, C. M. Improving enzyme optimum temperature prediction with resampling strategies and ensemble learning. J. Chem. Inf. Model. 60, 4098–4107 (2020).

    Article 

    Google Scholar 

  • Branco, P., Torgo, L. & Ribeiro, R. P. Pre-processing approaches for imbalanced distributions in regression. Neurocomputing 343, 76–99 (2019).

    Article 

    Google Scholar 

  • Yang, Y., Zha, K., Chen, Y.-C., Wang, H. & Katabi, D. Delving into deep imbalanced regression. In Proc. 38th International Conference on Machine Learning (eds Meila, M. and Zhang, T.) 11842–11851 (PMLR, 2021).

  • Chen, Z. et al. iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics 34, 2499–2502 (2018).

    Article 

    Google Scholar 

  • Unsal, S. et al. Learning functional properties of proteins with language models. Nat. Mach. Intell. 4, 227–245 (2022).

    Article 

    Google Scholar 

  • Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl Acad. Sci. USA 118, e2016239118 (2021).

    Article 

    Google Scholar 

  • Meier, J. et al. Language models enable zero-shot prediction of the effects of mutations on protein function. In Advances in Neural Information Processing Systems (eds Ranzato, M. et al.) 29287–29303 (Curran Associates, 2021).

  • Elnaggar, A. et al. ProtTrans: toward understanding the language of life through self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7112–7127 (2022).

    Article 

    Google Scholar 

  • Notin, P. et al. Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval. In Proc. 39th International Conference on Machine Learning (eds Chaudhuri, K. et al.) 16990–17017 (PMLR, 2022).

  • Nijkamp, E., Ruffolo, J., Weinstein, E. N., Naik, N. & Madani, A. ProGen2: exploring the boundaries of protein language models. Cell Syst. 14, 968–978.e3 (2023).

    Article 

    Google Scholar 

  • Yang, K. K., Lu, A. X. & Fusi, N. Convolutions are competitive with transformers for protein sequence pretraining. Cell Syst. 15, 286–294 (2022).

  • Stärk, H., Dallago, C., Heinzinger, M. & Rost, B. Light attention predicts protein location from the language of life. Bioinform. Adv. 1, vbab035 (2021).

    Article 

    Google Scholar 

  • Bergstra, J. & Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012).

    MathSciNet 

    Google Scholar 

  • Li, G. et al. Performance of regression models as a function of experiment noise. Bioinform. Biol. Insights 15, 11779322211020315 (2021).

    Article 

    Google Scholar 

  • Detlefsen, N. S., Hauberg, S. & Boomsma, W. Learning meaningful representations of protein sequences. Nat. Commun. 13, 1914 (2022).

    Article 

    Google Scholar 

  • Kroll, A. & Lercher, M. J. Machine learning models for the prediction of enzyme properties should be tested on proteins not used for model training. Preprint at bioRxiv https://doi.org/10.1101/2023.02.06.526991 (2023).

  • Suplatov, D. et al. Computational design of a pH stable enzyme: understanding molecular mechanism of penicillin acylase’s adaptation to alkaline conditions. PLoS ONE 9, e100643 (2014).

    Article 

    Google Scholar 

  • Huang, Y., Krauss, G., Cottaz, S., Driguez, H. & Lipps, G. A highly acid-stable and thermostable endo-β-glucanase from the thermoacidophilic archaeon Sulfolobus solfataricus. Biochem. J. 385, 581–588 (2005).

    Article 

    Google Scholar 

  • Mamo, G., Thunnissen, M., Hatti-Kaul, R. & Mattiasson, B. An alkaline active xylanase: insights into mechanisms of high pH catalytic adaptation. Biochimie 91, 1187–1196 (2009).

    Article 

    Google Scholar 

  • Wang, Y., Xu, M., Yang, T., Zhang, X. & Rao, Z. Surface charge-based rational design of aspartase modifies the optimal pH for efficient β-aminobutyric acid production. Int. J. Biol. Macromol. 164, 4165–4172 (2020).

    Article 

    Google Scholar 

  • Jakob, F. et al. Surface charge engineering of a Bacillus gibsonii subtilisin protease. Appl. Microbiol. Biotechnol. 97, 6793–6802 (2013).

    Article 

    Google Scholar 

  • Yang, T. et al. N20D/N116E combined mutant downward shifted the pH optimum of Bacillus subtilis NADH oxidase. Biology 12, 522 (2023).

    Article 

    Google Scholar 

  • Masui, A., Fujiwara, N., Yamamoto, K., Takagi, M. & Imanaka, T. Rational design for stabilization and optimum pH shift of serine protease AprN. J. Ferment. Bioeng. 85, 30–36 (1998).

    Article 

    Google Scholar 

  • Turunen, O., Vuorio, M., Fenel, F. & Leisola, M. Engineering of multiple arginines into the Ser/Thr surface of Trichoderma reesei endo-1,4-β-xylanase II increases the thermotolerance and shifts the pH optimum towards alkaline pH. Protein Eng. 15, 141–145 (2002).

    Article 

    Google Scholar 

  • Li, Q., Jiang, T., Liu, R., Feng, X. & Li, C. Tuning the pH profile of β-glucuronidase by rational site-directed mutagenesis for efficient transformation of glycyrrhizin. Appl. Microbiol. Biotechnol. 103, 4813–4823 (2019).

    Article 

    Google Scholar 

  • Pokhrel, S., Joo, J. C. & Yoo, Y. J. Shifting the optimum pH of Bacillus circulans xylanase towards acidic side by introducing arginine. Biotechnol. Bioprocess Eng. 18, 35–42 (2013).

    Article 

    Google Scholar 

  • Carvalho, D. V., Pereira, E. M. & Cardoso, J. S. Machine learning interpretability: a survey on methods and metrics. Electronics 8, 832 (2019).

    Article 

    Google Scholar 

  • Olsson, M. H. M., Søndergaard, C. R., Rostkowski, M. & Jensen, J. H. PROPKA3: consistent treatment of internal and surface residues in empirical pKa predictions. J. Chem. Theory Comput. 7, 525–537 (2011).

    Article 

    Google Scholar 

  • Talley, K. & Alexov, E. On the pH-optimum of activity and stability of proteins. Proteins 78, 2699–2706 (2010).

    Article 

    Google Scholar 

  • Alexov, E. Numerical calculations of the pH of maximal protein stability. The effect of the sequence composition and three-dimensional structure. Eur. J. Biochem. 271, 173–185 (2004).

    Article 

    Google Scholar 

  • Yu, T. et al. Enzyme function prediction using contrastive learning. Science 379, 1358–1363 (2023).

    Article 

    Google Scholar 

  • Pak, M. A., Dovidchenko, N. V., Sharma, S. M. & Ivankov, D. N. New mega dataset combined with deep neural network makes a progress in predicting impact of mutation on protein stability. Preprint at bioRxiv https://doi.org/10.1101/2022.12.31.522396 (2023).

  • Kroll, A., Rousset, Y., Hu, X.-P., Liebrand, N. A. & Lercher, M. J. Turnover number predictions for kinetically uncharacterized enzymes using machine and deep learning. Nat. Commun. 14, 4139 (2023).

    Article 

    Google Scholar 

  • Gelman, S., Fahlberg, S. A., Heinzelman, P., Romero, P. A. & Gitter, A. Neural networks to learn protein sequence–function relationships from deep mutational scanning data. Proc. Natl Acad. Sci. USA 118, e2104878118 (2021).

    Article 

    Google Scholar 

  • Dallago, C. et al. FLIP: benchmark tasks in fitness landscape inference for proteins. Preprint at bioRxiv https://doi.org/10.1101/2021.11.09.467890 (2022).

  • Sledzieski, S. et al. Democratizing protein language models with parameter-efficient fine-tuning. Proc. Natl Acad. Sci. USA 121, e2405840121 (2024).

    Article 

    Google Scholar 

  • Xu, M. et al. PEER: a comprehensive and multi-task benchmark for protein sequence understanding. Adv. Neural Inf. Process. Syst. 35, 35156–35173 (2022).

    Google Scholar 

  • Ferdous, S., Shihab, I. F. & Reuel, N. F. Effects of sequence features on machine-learned enzyme classification fidelity. Biochem. Eng. J. 187, 108612 (2022).

    Article 

    Google Scholar 

  • Li, G., Rabe, K. S., Nielsen, J. & Engqvist, M. K. M. Machine learning applied to predicting microorganism growth temperatures and enzyme catalytic optima. ACS Synth. Biol. 8, 1411–1420 (2019).

    Article 

    Google Scholar 

  • Liu, H., HaoChen, J. Z., Gaidon, A. & Ma, T. Self-supervised learning is more robust to dataset imbalance. Preprint at https://arxiv.org/abs/2110.05025 (2022).

  • Zaretckii, M., Buslaev, P., Kozlovskii, I., Morozov, A. & Popov, P. Approaching optimal pH enzyme prediction with large language models. ACS Synth. Biol. https://doi.org/10.1021/acssynbio.4c00465 (2024).

  • Song, Y. et al. Accurately predicting enzyme functions through geometric graph learning on ESMFold-predicted structures. Nat. Commun. 15, 8180 (2024).

    Article 

    Google Scholar 

  • Richardson, L. et al. MGnify: the microbiome sequence data analysis resource in 2023. Nucleic Acids Res. 51, D753–D759 (2023).

    Article 

    Google Scholar 

  • UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 51, D523–D531 (2023).

    Article 

    Google Scholar 

  • Hopf, T. A. et al. Mutation effects predicted from sequence co-variation. Nat. Biotechnol. 35, 128–135 (2017).

    Article 

    Google Scholar 

  • Hie, B. L. & Yang, K. K. Adaptive machine learning for protein engineering. Curr. Opin. Struct. Biol. 72, 145–152 (2022).

    Article 

    Google Scholar 

  • Yang, K. K., Wu, Z. & Arnold, F. H. Machine-learning-guided directed evolution for protein engineering. Nat. Methods 16, 687–694 (2019).

    Article 

    Google Scholar 

  • Hawkins-Hooker, A. et al. Generating functional protein variants with variational autoencoders. PLoS Comput. Biol. 17, e1008736 (2021).

    Article 

    Google Scholar 

  • Strokach, A. & Kim, P. M. Deep generative modeling for protein design. Curr. Opin. Struct. Biol. 72, 226–236 (2022).

    Article 

    Google Scholar 

  • Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).

    Article 

    Google Scholar 

  • Cock, P. J. A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).

    Article 

    Google Scholar 

  • Cui, Y., Jia, M., Lin, T.-Y., Song, Y. & Belongie, S. Proc. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2019).

  • Riesselman, A. J., Ingraham, J. B. & Marks, D. S. Deep generative models of genetic variation capture the effects of mutations. Nat. Methods 15, 816–822 (2018).

    Article 

    Google Scholar 

  • Guyon, I., Weston, J., Barnhill, S. & Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002).

    Article 

    Google Scholar 

  • Shin, J.-E. et al. Protein design and variant prediction using autoregressive generative models. Nat. Commun. 12, 2403 (2021).

    Article 

    Google Scholar 

  • van den Oord, A. et al. WaveNet: a generative model for raw audio. Preprint at https://arxiv.org/abs/1609.03499 (2016).

  • Cho, K. et al. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proc. 2014 Conference on Empirical Methods in Natural Language Processing (eds Moschitti, A. et al.) 1724–1734 (Association for Computational Linguistics, 2014); https://doi.org/10.3115/v1/D14-1179

  • Pascanu, R., Mikolov, T. & Bengio, Y. On the difficulty of training recurrent neural networks. In Proc. 30th International Conference on Machine Learning (eds Dasgupta, S. & McAllester, D.) 1310–1318 (PMLR, 2013).

  • Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).

    Article 

    Google Scholar 

  • Alley, E. C., Khimulya, G., Biswas, S., AlQuraishi, M. & Church, G. M. Unified rational protein engineering with sequence-based deep representation learning. Nat. Methods 16, 1315–1322 (2019).

    Article 

    Google Scholar 

  • Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    MathSciNet 

    Google Scholar 

  • Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems (eds Wallach, H. et al.) 8024–8035 (Curran Associates, 2019).

  • Varadi, M. et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50, D439–D444 (2022).

    Article 

    Google Scholar 

  • Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).

    Article 

    Google Scholar 

  • Shrake, A. & Rupley, J. A. Environment and exposure to solvent of protein atoms. Lysozyme and insulin. J. Mol. Biol. 79, 351–371 (1973).

    Article 

    Google Scholar 

  • Rost, B. & Sander, C. Conservation and prediction of solvent accessibility in protein families. Proteins 20, 216–226 (1994).

    Article 

    Google Scholar 

  • Savojardo, C., Manfredi, M., Martelli, P. L. & Casadio, R. Solvent accessibility of residues undergoing pathogenic variations in humans: from protein structures to protein sequences. Front. Mol. Biosci. 7, 626363 (2021).

    Article 

    Google Scholar 

  • Gado, J. E. et al. Machine learning prediction of enzyme optimal pH. Zenodo https://doi.org/10.5281/ZENODO.14252615 (2023).

  • Gado, J. jafetgado/EpHod: v1.0.0. Zenodo https://doi.org/10.5281/ZENODO.15015125 (2025).

  • Austin, H. P. et al. Characterization and engineering of a plastic-degrading aromatic polyesterase. Proc. Natl Acad. Sci. USA 115, E4350–E4357 (2018).

    Article 

    Google Scholar 

  • Don’t miss more hot News like this! Click here to discover the latest in AI news!

    2025-04-29 00:00:00

    Related Articles

    Back to top button