[2507.03318] Structure-Aware Compound-Protein Affinity Prediction via Graph Neural Network with Group Lasso Regularization

0 1 minute read

250703318 Structure Aware Compound Protein Affinity Prediction via Graph Neural Network with.png

[Submitted on 4 Jul 2025 (v1), last revised 7 Oct 2025 (this version, v2)]

View PDF of the article “Complex and Protein Structure Proximity Prediction via Graph Neural Network with Collective Lasso Regularization,” by Zhanyu Shi and 6 other authors

View PDF HTML (beta)

a summary:Explainable artificial intelligence (XAI) methods have been increasingly applied in drug discovery to learn molecular representations and identify the underlying structures that drive property predictions. However, building end-to-end interpretable structure-activity relationship (SAR) models for compound property prediction faces many challenges, such as the limited number of complex protein interaction activity data for specific protein targets, and many subtle changes in molecular conformation sites that significantly affect the molecular properties. We exploit pairs of molecules with active ramps that share scaffolds but differ in alternative sites, and which have significant differences in potency for specific protein targets. We propose a framework by implementing graph neural networks (GNNs) to leverage property and structure information from activity cliff pairs to predict complex protein affinity (i.e., half-maximal inhibitory concentration, IC50). To enhance model performance and interpretability, we train GNNs with structure-aware loss functions using ensemble lasso and sparse ensemble lasso regularizations, which prune and highlight molecular subgraphs relevant to activity differences. We applied this framework to activity cliff data for molecules targeting three Src protein tyrosine-protein kinases (PDB IDs: 1O42, 2H8H, 4MXO). Our approach improved ownership prediction by incorporating common and uncommon node information with sparse group lasso, as reflected in reduced root mean square error (RMSE) and improved Pearson correlation coefficient (PCC). Applying regularization operations also improves the feature attribution of GNN by enhancing the global trend scores at the graph level and improving the coloring accuracy at the atom level. These advances enhance the interpretability of models in drug discovery pipelines, especially for identifying molecular substructures important in lead optimization.