Google AI tool pinpoints genetic drivers of cancer

Google announced DeepSomatic, an artificial intelligence tool that can more accurately identify cancer-related mutations in a tumor’s genetic sequence.
Cancer begins when the controls governing cell division are disrupted. Finding the specific genetic mutations that drive tumor growth is essential for developing effective treatment plans. Doctors now regularly sequence the genomes of cancer cells from biopsies to guide treatments that can target how a particular cancer grows and spreads.
This work, published in the journal Nature Biotechnology, presents a tool that uses convolutional neural networks to identify genetic variants in cancer cells with greater accuracy than current methods. Google has made both DeepSomatic and the high-quality training dataset created for it openly available.
The challenge of physical variables
Cancer genetics is complex. While genome sequencing reveals genetic variations in cancer, distinguishing between true variants and sequencing errors is difficult, which is where an AI tool could provide welcome assistance. Most cancers are driven by “somatic” variants acquired after birth rather than “germline” variants inherited from parents.
Somatic mutations occur when environmental factors such as ultraviolet radiation damage DNA, or when random errors occur during DNA replication. When these variants alter normal cell behavior, they can cause uncontrolled proliferation, leading to the development and progression of cancer.
Somatic variants are more difficult to identify than inherited variants, because they can be found at low frequencies within cancer cells, sometimes at rates lower than the sequencing error rate itself.
How DeepSomatic works
In clinical settings, scientists sequence cancer cells from a biopsy and normal cells from a patient. DeepSomatic detects differences and identifies differences in non-inherited cancer cells. These differences reveal what fuels tumor growth.
The model converts raw genomic sequence data from both tumor and normal samples into images representing different data points, including sequence data and aligning them along the chromosome. A convolutional neural network analyzes these images to distinguish between the standard reference genome, inherited variants that are normal for the individual, and somatic cancer-causing variants while filtering out sequencing errors. The output is a list of mutations associated with cancer.
DeepSomatic can also operate in a “tumor only” mode when no normal cell samples are available, which is often the case with blood cancers such as leukemia. This makes the tool applicable across many research and clinical scenarios.
Training a more accurate AI cancer research tool
Training an accurate AI model requires high-quality data. For its AI tool, Google and its partners at the UC Santa Cruz Genomics Institute and the National Cancer Institute created a benchmark dataset called CASTLE. They sequenced cancer cells and normal cells from four breast cancer samples and two lung cancer samples.
These samples were analyzed using three leading sequencing platforms to create a single, accurate reference dataset by combining the outputs and removing platform-specific errors. The data show how the same type of cancer can have vastly different mutational signatures, information that can help predict a patient’s response to specific treatments.
DeepSomatic models performed better than other established approaches across all three major sequencing platforms. The tool excelled at identifying complex mutations called insertions and deletions, or “Indels.” For these variants, DeepSomatic achieved an F1 score of 90% on Illumina sequencing data, compared to 80% for the next best method. The improvement was even more dramatic in the Pacific Biosciences data, where DeepSomatic scored more than 80% while the next best tool scored less than 50%.
The AI performed well when analyzing difficult samples. The testing involved a breast cancer sample preserved using formalin-fixed paraffin (FFPE), a common method that can damage DNA and complicate analysis. It was also tested on data from whole exome sequencing (WES), a less expensive method that sequences only 1% of the genome coding for proteins. In both scenarios, DeepSomatic outperformed the other tools, indicating its utility in analyzing low-quality or historical samples.
An AI tool for all types of cancer
The AI tool showed that it could apply what it learned to new types of cancer for which it had not been trained. When used to analyze a sample of glioblastoma, an aggressive brain cancer, it successfully identified the few variants known to cause the disease. In partnership with Children’s Mercy in Kansas City, she analyzed eight pediatric leukemia samples, finding previously known variants while identifying 10 new variants, despite working only with tumor samples.
Google hopes that research laboratories and clinicians will adopt this tool to better understand individual tumors. By detecting known cancer variants, it can help guide current treatment options. By identifying new species, it could lead to new treatments. The goal is to develop precision medicine and provide more effective treatments to patients.
See also: The MHRA is accelerating the next wave of AI tools for patient care
Want to learn more about AI and Big Data from industry leaders? Check out the Artificial Intelligence and Big Data Expo taking place in Amsterdam, California and London. This comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security Expo, click here for more information.
AI News is powered by TechForge Media. Explore other enterprise technology events and webinars here.
Don’t miss more hot News like this! Click here to discover the latest in AI news!
2025-10-17 13:55:00