A Coding Implementation for Creating, Annotating, and Visualizing Complex Biological Knowledge Graphs Using PyBEL

1 5 minutes read

1750821400 A Coding Implementation for Creating Annotating and Visualizing Complex Biological.png

In this tutorial, we explore how to benefit from Bic The ecosystem for building and analyzing the graphs of the rich biological knowledge within Google Colab. We start installing all the necessary packages, including PYBEL, NetworkX, Matplotlib, Seaborn and Pandas. Then we explain how to identify proteins, processes and modifications using PYBEL DSL. From there, we are directed by creating a path related to Alzheimer’s disease, showing how causal relations, protein protein reactors, and phosphorous events. In addition to creating the graph, we provide advanced network analyzes, including central standards, knot classification, sub -fee extraction, as well as quotation and evidence extraction techniques. By the end of this section, you will have a fully explained Billy drawing ready for perception analyzes and enrichment, which puts a solid basis for exploring interactive biological knowledge.

!pip install pybel pybel-tools networkx matplotlib seaborn pandas -q


import pybel
import pybel.dsl as dsl
from pybel import BELGraph
from pybel.io import to_pickle, from_pickle
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from collections import Counter
import warnings
warnings.filterwarnings('ignore')


print("PyBEL Advanced Tutorial: Biological Expression Language Ecosystem")
print("=" * 65)

We start installing PYBEL and its consequences directly in the colum, ensuring that all the necessary libraries, NetworkX, Matplotlib, Seaborn, and Pandas, are available for our analysis. Once installed, import the basic units and suppress warnings to keep our notebook clean and focus on results.

print("\n1. Building a Biological Knowledge Graph")
print("-" * 40)


graph = BELGraph(
   name="Alzheimer's Disease Pathway",
   version="1.0.0",
   description="Example pathway showing protein interactions in AD",
   authors="PyBEL Tutorial"
)


app = dsl.Protein(name="APP", namespace="HGNC")
abeta = dsl.Protein(name="Abeta", namespace="CHEBI")
tau = dsl.Protein(name="MAPT", namespace="HGNC")
gsk3b = dsl.Protein(name="GSK3B", namespace="HGNC")
inflammation = dsl.BiologicalProcess(name="inflammatory response", namespace="GO")
apoptosis = dsl.BiologicalProcess(name="apoptotic process", namespace="GO")




graph.add_increases(app, abeta, citation="PMID:12345678", evidence="APP cleavage produces Abeta")
graph.add_increases(abeta, inflammation, citation="PMID:87654321", evidence="Abeta triggers neuroinflammation")


tau_phosphorylated = dsl.Protein(name="MAPT", namespace="HGNC",
                               variants=[dsl.ProteinModification("Ph")])
graph.add_increases(gsk3b, tau_phosphorylated, citation="PMID:11111111", evidence="GSK3B phosphorylates tau")
graph.add_increases(tau_phosphorylated, apoptosis, citation="PMID:22222222", evidence="Hyperphosphorylated tau causes cell death")
graph.add_increases(inflammation, apoptosis, citation="PMID:33333333", evidence="Inflammation promotes apoptosis")


graph.add_association(abeta, tau, citation="PMID:44444444", evidence="Abeta and tau interact synergistically")


print(f"Created BEL graph with {graph.number_of_nodes()} nodes and {graph.number_of_edges()} edges")

We prepare Belgraph with the descriptive data of the Alzheimer’s path and define proteins and processes using PYBEL DSL. By adding causal relationships, protein modifications, and associations, we build a comprehensive network that captures the main molecular reactions.

print("\n2. Advanced Network Analysis")
print("-" * 30)


degree_centrality = nx.degree_centrality(graph)
betweenness_centrality = nx.betweenness_centrality(graph)
closeness_centrality = nx.closeness_centrality(graph)


most_central = max(degree_centrality, key=degree_centrality.get)
print(f"Most connected node: {most_central}")
print(f"Degree centrality: {degree_centrality[most_central]:.3f}")

We calculate the degree and grades and reach proximity to measure the importance of each knot within the graph. By selecting the most connected nodes, we gain an insight into possible axes that may push the disease mechanisms.

print("\n3. Biological Entity Classification")
print("-" * 35)


node_types = Counter()
for node in graph.nodes():
   node_types[node.function] += 1


print("Node distribution:")
for func, count in node_types.items():
   print(f"  {func}: {count}")

We classify each knot through its function, such as protein or biological treatment, and milking their concern. This collapse helps us understand the formation of our network at a glimpse.

print("\n4. Pathway Analysis")
print("-" * 20)


proteins = [node for node in graph.nodes() if node.function == 'Protein']
processes = [node for node in graph.nodes() if node.function == 'BiologicalProcess']


print(f"Proteins in pathway: {len(proteins)}")
print(f"Biological processes: {len(processes)}")


edge_types = Counter()
for u, v, data in graph.edges(data=True):
   edge_types[data.get('relation')] += 1


print("\nRelationship types:")
for rel, count in edge_types.items():
   print(f"  {rel}: {count}")

We separate all proteins and processes to measure and complicate the path. Calling different types of relationships reveals interactions, such as increases or links, which dominate our model.

print("\n5. Literature Evidence Analysis")
print("-" * 32)


citations = []
evidences = []
for _, _, data in graph.edges(data=True):
   if 'citation' in data:
       citations.append(data['citation'])
   if 'evidence' in data:
       evidences.append(data['evidence'])


print(f"Total citations: {len(citations)}")
print(f"Unique citations: {len(set(citations))}")
print(f"Evidence statements: {len(evidences)}")

We extract the identifiers of the quotation and the chains of evidence from each edge to assess our graphic basis in published research. The summary of the unique and unique categories allows us to evaluate the breadth of supporting literature.

print("\n6. Subgraph Analysis")
print("-" * 22)


inflammation_nodes = [inflammation]
inflammation_neighbors = list(graph.predecessors(inflammation)) + list(graph.successors(inflammation))
inflammation_subgraph = graph.subgraph(inflammation_nodes + inflammation_neighbors)


print(f"Inflammation subgraph: {inflammation_subgraph.number_of_nodes()} nodes, {inflammation_subgraph.number_of_edges()} edges")

We isolate the sub -map of inflammation by collecting its direct neighbors, resulting in a focused vision of inflammatory signs. These targeted sub -networks highlight how inflammation interacts with other disease processes.

print("\n7. Advanced Graph Querying")
print("-" * 28)


try:
   paths = list(nx.all_simple_paths(graph, app, apoptosis, cutoff=3))
   print(f"Paths from APP to apoptosis: {len(paths)}")
   if paths:
       print(f"Shortest path length: {len(paths[0])-1}")
except nx.NetworkXNoPath:
   print("No paths found between APP and apoptosis")


apoptosis_inducers = list(graph.predecessors(apoptosis))
print(f"Factors that increase apoptosis: {len(apoptosis_inducers)}")

We enumerate simple tracks between the application and the dying cell death to explore mechanical methods and identify the main mediators. Including all the ancestors of programmed cell death also shows us the factors that may lead to cell death.

print("\n8. Data Export and Visualization")
print("-" * 35)


adj_matrix = nx.adjacency_matrix(graph)
node_labels = [str(node) for node in graph.nodes()]


plt.figure(figsize=(12, 8))


plt.subplot(2, 2, 1)
pos = nx.spring_layout(graph, k=2, iterations=50)
nx.draw(graph, pos, with_labels=False, node_color="lightblue",
       node_size=1000, font_size=8, font_weight="bold")
plt.title("BEL Network Graph")


plt.subplot(2, 2, 2)
centralities = list(degree_centrality.values())
plt.hist(centralities, bins=10, alpha=0.7, color="green")
plt.title("Degree Centrality Distribution")
plt.xlabel("Centrality")
plt.ylabel("Frequency")


plt.subplot(2, 2, 3)
functions = list(node_types.keys())
counts = list(node_types.values())
plt.pie(counts, labels=functions, autopct="%1.1f%%", startangle=90)
plt.title("Node Type Distribution")


plt.subplot(2, 2, 4)
relations = list(edge_types.keys())
rel_counts = list(edge_types.values())
plt.bar(relations, rel_counts, color="orange", alpha=0.7)
plt.title("Relationship Types")
plt.xlabel("Relation")
plt.ylabel("Count")
plt.xticks(rotation=45)


plt.tight_layout()
plt.show()

We prepare the adjacent matrices and knot stickers for use and create a multi -pawn character that explains the network structure, central distributions, knot rates, and a number of the edge type. These perceptions bring our chart to life, which supports a deeper biological interpretation.

In this tutorial, we have shown the strength and elasticity of the rope to mix complex biological systems. We have shown the ease of building a graph through Alzheimer’s reactions, conducting the network levels to determine the main axis, and extracting sub -hoses with a biologically significant for concentrated study. We also covered the basic practices to extract literature guides and data structures intended for convincing perceptions. As a next step, we encourage you to expand this framework to your paths, combine additional omics data, run enrichment tests, or associate graph with a workflow for machine learning.

verify The symbols here. All the credit for this research goes to researchers in this project. Also, do not hesitate to follow us twitter And do not forget to join 100K+ ML Subreddit And subscribe to Our newsletter.

SANA Hassan, consultant coach at Marktechpost and a double -class student in Iit Madras, is excited to apply technology and AI to face challenges in the real world. With great interest in solving practical problems, it brings a new perspective to the intersection of artificial intelligence and real life solutions.