Building a BioCypher-Powered AI Agent for Biomedical Knowledge Graph Generation and Querying

0 7 minutes read

1751741106 Building a BioCypher Powered AI Agent for Biomedical Knowledge Graph Generation.png

In this tutorial, we implement the AI Biocypher agent, a powerful tool designed to build, inquire and analyze graphs for biomedical knowledge. By combining strengths BiocypherIt is a high -performance interface, based on the plan for the integration of biological data, with the elasticity of Networkx, this tutorial enables users to simulate complex biological relationships such as links that cause genes, targeted drug reactions, and involve the path. The worker also includes capabilities to generate artificial vital medical data, depict the graphic fees for knowledge, and the performance of smart queries, such as the center’s analysis and the discovery of neighborhood.

!pip install biocypher pandas numpy networkx matplotlib seaborn


import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
import json
import random
from typing import Dict, List, Tuple, Any

We start installing the basic Python libraries required to analyze the biomedic graph, including Biocypher, Pandas, Numby, NetworkX, Matplotlib and Searborn. These packages enable us to deal with data, create graphs for knowledge, and effectively depict relationships. Once installed, we import all the stereotypes needed to prepare our development environment.

try:
   from biocypher import BioCypher
   from biocypher._config import config
   BIOCYPHER_AVAILABLE = True
except ImportError:
   print("BioCypher not available, using NetworkX-only implementation")
   BIOCYPHER_AVAILABLE = False

We try to import a biocypher framework, which provides an interface based on the chart to manage graphs for biomedical knowledge. If the import succeeds, we will enable Biocypher features; Otherwise, we are safely returning to NetworkX’s mode only, making sure that the rest of the analysis can still proceed without interruption.

class BiomedicalAIAgent:
   """Advanced AI Agent for biomedical knowledge graph analysis using BioCypher"""
  
   def __init__(self):
       if BIOCYPHER_AVAILABLE:
           try:
               self.bc = BioCypher()
               self.use_biocypher = True
           except Exception as e:
               print(f"BioCypher initialization failed: {e}")
               self.use_biocypher = False
       else:
           self.use_biocypher = False
          
       self.graph = nx.Graph()
       self.entities = {}
       self.relationships = []
       self.knowledge_base = self._initialize_knowledge_base()
      
   def _initialize_knowledge_base(self) -> Dict[str, List[str]]:
       """Initialize sample biomedical knowledge base"""
       return {
           "genes": ["BRCA1", "TP53", "EGFR", "KRAS", "MYC", "PIK3CA", "PTEN"],
           "diseases": ["breast_cancer", "lung_cancer", "diabetes", "alzheimer", "heart_disease"],
           "drugs": ["aspirin", "metformin", "doxorubicin", "paclitaxel", "imatinib"],
           "pathways": ["apoptosis", "cell_cycle", "DNA_repair", "metabolism", "inflammation"],
           "proteins": ["p53", "EGFR", "insulin", "hemoglobin", "collagen"]
       }
  
   def generate_synthetic_data(self, n_entities: int = 50) -> None:
       """Generate synthetic biomedical data for demonstration"""
       print("🧬 Generating synthetic biomedical data...")
      
       for entity_type, items in self.knowledge_base.items():
           for item in items:
               entity_id = f"{entity_type}_{item}"
               self.entities[entity_id] = {
                   "id": entity_id,
                   "type": entity_type,
                   "name": item,
                   "properties": self._generate_properties(entity_type)
               }
      
       entity_ids = list(self.entities.keys())
       for _ in range(n_entities):
           source = random.choice(entity_ids)
           target = random.choice(entity_ids)
           if source != target:
               rel_type = self._determine_relationship_type(
                   self.entities[source]["type"],
                   self.entities[target]["type"]
               )
               self.relationships.append({
                   "source": source,
                   "target": target,
                   "type": rel_type,
                   "confidence": random.uniform(0.5, 1.0)
               })

We define the Biomedicalaaagent layer as an essential engine to analyze the graphs of biomedical knowledge using Biocypher. In the originator, we check whether Biocypher is available and prepared if possible; Otherwise, we are only behind the NetworkX approach. We also created our basic structures, including an empty graphic drawing, dictionaries of entities and relationships, and pre -defined biomedical knowledge base. Then we use generate_synthetic_Data () to fill this graph with real biological entities, such as genes, diseases, medicines, paths, and simulating their interactions through biological -meaningful relationships.

  def _generate_properties(self, entity_type: str) -> Dict[str, Any]:
       """Generate realistic properties for different entity types"""
       base_props = {"created_at": "2024-01-01", "source": "synthetic"}
      
       if entity_type == "genes":
           base_props.update({
               "chromosome": f"chr{random.randint(1, 22)}",
               "expression_level": random.uniform(0.1, 10.0),
               "mutation_frequency": random.uniform(0.01, 0.3)
           })
       elif entity_type == "diseases":
           base_props.update({
               "prevalence": random.uniform(0.001, 0.1),
               "severity": random.choice(["mild", "moderate", "severe"]),
               "age_of_onset": random.randint(20, 80)
           })
       elif entity_type == "drugs":
           base_props.update({
               "dosage": f"{random.randint(10, 500)}mg",
               "efficacy": random.uniform(0.3, 0.95),
               "side_effects": random.randint(1, 10)
           })
      
       return base_props


   def _determine_relationship_type(self, source_type: str, target_type: str) -> str:
       """Determine biologically meaningful relationship types"""
       relationships_map = {
           ("genes", "diseases"): "associated_with",
           ("genes", "drugs"): "targeted_by",
           ("genes", "pathways"): "participates_in",
           ("drugs", "diseases"): "treats",
           ("proteins", "pathways"): "involved_in",
           ("diseases", "pathways"): "disrupts"
       }
      
       return relationships_map.get((source_type, target_type),
                                  relationships_map.get((target_type, source_type), "related_to"))


   def build_knowledge_graph(self) -> None:
       """Build knowledge graph using BioCypher or NetworkX"""
       print("🔗 Building knowledge graph...")
      
       if self.use_biocypher:
           try:
               for entity_id, entity_data in self.entities.items():
                   self.bc.add_node(
                       node_id=entity_id,
                       node_label=entity_data["type"],
                       node_properties=entity_data["properties"]
                   )
                  
               for rel in self.relationships:
                   self.bc.add_edge(
                       source_id=rel["source"],
                       target_id=rel["target"],
                       edge_label=rel["type"],
                       edge_properties={"confidence": rel["confidence"]}
                   )
               print("✅ BioCypher graph built successfully")
           except Exception as e:
               print(f"BioCypher build failed, using NetworkX only: {e}")
               self.use_biocypher = False
          
       for entity_id, entity_data in self.entities.items():
           self.graph.add_node(entity_id, **entity_data)
          
       for rel in self.relationships:
           self.graph.add_edge(rel["source"], rel["target"],
                             type=rel["type"], confidence=rel["confidence"])
      
       print(f"✅ NetworkX graph built with {len(self.graph.nodes())} nodes and {len(self.graph.edges())} edges")


   def intelligent_query(self, query_type: str, entity: str = None) -> Dict[str, Any]:
       """Intelligent querying system with multiple analysis types"""
       print(f"🤖 Processing intelligent query: {query_type}")
      
       if query_type == "drug_targets":
           return self._find_drug_targets()
       elif query_type == "disease_genes":
           return self._find_disease_associated_genes()
       elif query_type == "pathway_analysis":
           return self._analyze_pathways()
       elif query_type == "centrality_analysis":
           return self._analyze_network_centrality()
       elif query_type == "entity_neighbors" and entity:
           return self._find_entity_neighbors(entity)
       else:
           return {"error": "Unknown query type"}


   def _find_drug_targets(self) -> Dict[str, List[str]]:
       """Find potential drug targets"""
       drug_targets = {}
       for rel in self.relationships:
           if (rel["type"] == "targeted_by" and
               self.entities[rel["source"]]["type"] == "genes"):
               drug = self.entities[rel["target"]]["name"]
               target = self.entities[rel["source"]]["name"]
               if drug not in drug_targets:
                   drug_targets[drug] = []
               drug_targets[drug].append(target)
       return drug_targets


   def _find_disease_associated_genes(self) -> Dict[str, List[str]]:
       """Find genes associated with diseases"""
       disease_genes = {}
       for rel in self.relationships:
           if (rel["type"] == "associated_with" and
               self.entities[rel["target"]]["type"] == "diseases"):
               disease = self.entities[rel["target"]]["name"]
               gene = self.entities[rel["source"]]["name"]
               if disease not in disease_genes:
                   disease_genes[disease] = []
               disease_genes[disease].append(gene)
       return disease_genes


   def _analyze_pathways(self) -> Dict[str, int]:
       """Analyze pathway connectivity"""
       pathway_connections = {}
       for rel in self.relationships:
           if rel["type"] in ["participates_in", "involved_in"]:
               if self.entities[rel["target"]]["type"] == "pathways":
                   pathway = self.entities[rel["target"]]["name"]
                   pathway_connections[pathway] = pathway_connections.get(pathway, 0) + 1
       return dict(sorted(pathway_connections.items(), key=lambda x: x[1], reverse=True))


   def _analyze_network_centrality(self) -> Dict[str, Dict[str, float]]:
       """Analyze network centrality measures"""
       if len(self.graph.nodes()) == 0:
           return {}
          
       centrality_measures = {
           "degree": nx.degree_centrality(self.graph),
           "betweenness": nx.betweenness_centrality(self.graph),
           "closeness": nx.closeness_centrality(self.graph)
       }
      
       top_nodes = {}
       for measure, values in centrality_measures.items():
           top_nodes[measure] = dict(sorted(values.items(), key=lambda x: x[1], reverse=True)[:5])
      
       return top_nodes


   def _find_entity_neighbors(self, entity_name: str) -> Dict[str, List[str]]:
       """Find neighbors of a specific entity"""
       neighbors = {"direct": [], "indirect": []}
       entity_id = None
      
       for eid, edata in self.entities.items():
           if edata["name"].lower() == entity_name.lower():
               entity_id = eid
               break
              
       if not entity_id or entity_id not in self.graph:
           return {"error": f"Entity '{entity_name}' not found"}
          
       for neighbor in self.graph.neighbors(entity_id):
           neighbors["direct"].append(self.entities[neighbor]["name"])
          
       for direct_neighbor in self.graph.neighbors(entity_id):
           for indirect_neighbor in self.graph.neighbors(direct_neighbor):
               if (indirect_neighbor != entity_id and
                   indirect_neighbor not in list(self.graph.neighbors(entity_id))):
                   neighbor_name = self.entities[indirect_neighbor]["name"]
                   if neighbor_name not in neighbors["indirect"]:
                       neighbors["indirect"].append(neighbor_name)
                      
       return neighbors


   def visualize_network(self, max_nodes: int = 30) -> None:
       """Visualize the knowledge graph"""
       print("📊 Creating network visualization...")
      
       nodes_to_show = list(self.graph.nodes())[:max_nodes]
       subgraph = self.graph.subgraph(nodes_to_show)
      
       plt.figure(figsize=(12, 8))
       pos = nx.spring_layout(subgraph, k=2, iterations=50)
      
       node_colors = []
       color_map = {"genes": "red", "diseases": "blue", "drugs": "green",
                   "pathways": "orange", "proteins": "purple"}
      
       for node in subgraph.nodes():
           entity_type = self.entities[node]["type"]
           node_colors.append(color_map.get(entity_type, "gray"))
      
       nx.draw(subgraph, pos, node_color=node_colors, node_size=300,
               with_labels=False, alpha=0.7, edge_color="gray", width=0.5)
      
       plt.title("Biomedical Knowledge Graph Network")
       plt.axis('off')
       plt.tight_layout()
       plt.show()

We have designed a set of smart functions inside the Biomedicalaaagent layer to simulate vital medical scenarios in the real world. We create realistic properties for each entity type, define the types of biologically meaningful relationships, and build an organized organized graph using either biocypher or NetworkX. To acquire visions, we have included functions to analyze drug goals, pathological genetics, pathway connection, and network centralization, as well as the visual graph explorer that interviewed us in a way that understands the reactions between biomedical entities.

  def run_analysis_pipeline(self) -> None:
       """Run complete analysis pipeline"""
       print("🚀 Starting BioCypher AI Agent Analysis Pipeline\n")
      
       self.generate_synthetic_data()
       self.build_knowledge_graph()
      
       print(f"📈 Graph Statistics:")
       print(f"   Entities: {len(self.entities)}")
       print(f"   Relationships: {len(self.relationships)}")
       print(f"   Graph Nodes: {len(self.graph.nodes())}")
       print(f"   Graph Edges: {len(self.graph.edges())}\n")
      
       analyses = [
           ("drug_targets", "Drug Target Analysis"),
           ("disease_genes", "Disease-Gene Associations"),
           ("pathway_analysis", "Pathway Connectivity Analysis"),
           ("centrality_analysis", "Network Centrality Analysis")
       ]
      
       for query_type, title in analyses:
           print(f"🔍 {title}:")
           results = self.intelligent_query(query_type)
           self._display_results(results)
           print()
      
       self.visualize_network()
      
       print("✅ Analysis complete! AI Agent successfully analyzed biomedical data.")
      
   def _display_results(self, results: Dict[str, Any], max_items: int = 5) -> None:
       """Display analysis results in a formatted way"""
       if isinstance(results, dict) and "error" not in results:
           for key, value in list(results.items())[:max_items]:
               if isinstance(value, list):
                   print(f"   {key}: {', '.join(value[:3])}{'...' if len(value) > 3 else ''}")
               elif isinstance(value, dict):
                   print(f"   {key}: {dict(list(value.items())[:3])}")
               else:
                   print(f"   {key}: {value}")
       else:
           print(f"   {results}")


   def export_to_formats(self) -> None:
       """Export knowledge graph to various formats"""
       if self.use_biocypher:
           try:
               print("📤 Exporting BioCypher graph...")
               print("✅ BioCypher export completed")
           except Exception as e:
               print(f"BioCypher export failed: {e}")
      
       print("📤 Exporting NetworkX graph to formats...")
      
       graph_data = {
           "nodes": [{"id": n, **self.graph.nodes[n]} for n in self.graph.nodes()],
           "edges": [{"source": u, "target": v, **self.graph.edges[u, v]}
                    for u, v in self.graph.edges()]
       }
      
       try:
           with open("biomedical_graph.json", "w") as f:
               json.dump(graph_data, f, indent=2, default=str)
          
           nx.write_graphml(self.graph, "biomedical_graph.graphml")
           print("✅ Graph exported to JSON and GraphML formats")
       except Exception as e:
           print(f"Export failed: {e}")


   def export_to_formats(self) -> None:
       """Export knowledge graph to various formats"""
       if self.use_biocypher:
           try:
               print("📤 Exporting BioCypher graph...")
               print("✅ BioCypher export completed")
           except Exception as e:
               print(f"BioCypher export failed: {e}")
      
       print("📤 Exporting NetworkX graph to formats...")
      
       graph_data = {
           "nodes": [{"id": n, **self.graph.nodes[n]} for n in self.graph.nodes()],
           "edges": [{"source": u, "target": v, **self.graph.edges[u, v]}
                    for u, v in self.graph.edges()]
       }
      
       with open("biomedical_graph.json", "w") as f:
           json.dump(graph_data, f, indent=2, default=str)
      
       nx.write_graphml(self.graph, "biomedical_graph.graphml")
      
       print("✅ Graph exported to JSON and GraphML formats")
       """Display analysis results in a formatted way"""
       if isinstance(results, dict) and "error" not in results:
           for key, value in list(results.items())[:max_items]:
               if isinstance(value, list):
                   print(f"   {key}: {', '.join(value[:3])}{'...' if len(value) > 3 else ''}")
               elif isinstance(value, dict):
                   print(f"   {key}: {dict(list(value.items())[:3])}")
               else:
                   print(f"   {key}: {value}")
       else:
           print(f"   {results}")

We conclude the AI Agent workflow through the function of Run_Analysis_PIPELINE () that connects everything together, from generating artificial data and building the graph to implementing smart inquiries and final perception. The automated pipeline enables us to monitor biomedical relationships, analyze central entities, and understand how different biological concepts are bonded. Finally, using export_to_formats (), we guarantee that the resulting graph is saved in JSON and Graphml for more use, making our analysis to be shared and repetitive.

if __name__ == "__main__":
   agent = BiomedicalAIAgent()
   agent.run_analysis_pipeline()

We conclude the tutorial by creating our full analysis pipeline. This entry point enables us to implement all steps, including data generation, building graphics, smart inquiries, perception, and reporting, in one simplified matter, making it easy to explore biomedical knowledge using biocypher.

In conclusion, through this advanced tutorial, we gain practical experience in working with BiocyPher to create graphs for developing vital medical knowledge and conduct insightful biological analyzes. Double support guarantee ensures that even if Biocypher is not available, the system returns to NetworkX for full jobs. It offers the ability to create artificial data collections, implement smart chart information, visualize relationships, and export with multiple flexibility and analytical strengths of the biocypher worker. In general, this tutorial shows how Biocepher can act as a critical infrastructure for biomedical AI, which makes complex biological data useable and insight into estuary applications.

verify The symbols here. All the credit for this research goes to researchers in this project. Also, do not hesitate to follow us twitter And do not forget to join 100K+ ML Subreddit And subscribe to Our newsletter.

SANA Hassan, consultant coach at Marktechpost and a double -class student in Iit Madras, is excited to apply technology and AI to face challenges in the real world. With great interest in solving practical problems, it brings a new perspective to the intersection of artificial intelligence and real life solutions.