Transforming Cancer Biomarker Discovery with Machine Learning

Transforming Cancer Biomarker Discovery with Machine Learning


Just as examining a cube-shaped puzzle from different perspectives reveals more details, multiomics approaches provide researchers with a more comprehensive view of cancer’s complex biology, which helps them pinpoint new biomarkers and potential treatment targets.

After losing his favorite aunt to cancer when he was around 10 years old, Dan Theodorescu decided to dedicate his life to helping patients with this devastating disease. Now as a surgical oncologist, translational cancer researcher, and director of the University of Arizona Cancer Center, he splits his time between making critical discoveries in the laboratory and managing a growing cancer program. Recently, Theodorescu’s team developed a precision medicine platform called the Molecular Twin, which uses machine learning to examine genomic, transcriptomic, proteomic, lipidomic, and computational pathology analytes from patients with pancreatic adenocarcinoma (PDAC) and identify new biomarkers.1

Dan Theodorescu and his team created the Molecular Twin, an AI-driven precision medicine platform, to improve the discovery of predictive cancer biomarkers from large and complex multiomics datasets.

What are the benefits and challenges of using multiomics approaches to discover cancer biomarkers?

Multiomics means that scientists are measuring more than one class of analyte, such as DNA, RNA, or proteins. This allows them to fill in the blanks of what one analyte may not completely report. For example, DNA is a blueprint of biology, but RNA is an actualization of that blueprint. These molecules are even closer to the gears of a living organism, which are proteins. By measuring multiple analytes, researchers can see how a mutation in DNA leads to altered RNA levels or mutant protein production.

When searching for novel cancer biomarkers, scientists will often examine multiple analytes in several specimen types, such as urine, serum, or cerebrospinal fluid. As a result, multiomics analysis yields an enormous amount of data. The biggest challenge is how to interpret the information obtained and determine which analytes are informative.

What is the concept behind the Molecular Twin platform and how does it improve biomarker discovery?

We based the Molecular Twin platform on the digital twin concept, which involves creating a virtual representation of a physical object or system.1 Engineers extensively use these computational models for tasks such as jet engine design and vehicle crash testing, allowing them to assess the effect of changing a single variable on the system’s behavior. The idea behind the Molecular Twin was to create a digital molecular avatar of the patient that can help us discover new biomarkers of drug response and therapeutic targets. Now, we are not there yet, but this work is the first iteration of that vision.

Currently, the Molecular Twin platform leverages machine learning to make sense of complex multiomics datasets and identify which analytes could serve as predictive or prognostic biomarkers. Besides just analyzing the cancer itself, we hypothesized that it would be advantageous to quantify the interaction between the cancer and the host and potentially use it as a biomarker. Cancer does not grow in a vacuum, so the host does affect its biology. The Molecular Twin architecture is novel because it analyzes multiomics data related to both the cancer and the host response. It represents the cancer through features such as DNA, RNA, pathology, and histology, while it captures the host using circulating factors such as proteins and lipids, along with clinical characteristics including age, weight, and performance status.

Why is it important to find new biomarkers for PDAC?

Currently, physicians use a tumor’s stage and grade and serum carbohydrate antigen 19-9 (CA 19-9) levels to predict whether the patient would benefit from surgical resection, which is a very big operation. The problem is that CA 19-9 is not that accurate, and many patients with PDAC do not benefit from surgery. Using the Molecular Twin platform, we were able to craft a parsimonious or minimalistic biomarker panel that could predict survival outcomes after surgery better than CA 19-9. Although we initially developed and tested the platform using data from patients with PDAC, the Molecular Twin is tumor agnostic. Other scientists are now using it to identify biomarkers for various cancer types.

What does the future hold for the Molecular Twin platform?

We will continue to improve the platform by adding more omics, including epigenomics and metabolomics. It is like looking at a cube-shaped puzzle from various perspectives. The more we look at it from different angles, the more we know. I hope that this platform could eventually lead to the democratization of precision medicine, where individual laboratories will not need to have these expensive instruments. Instead, we could centralize these technologies and evaluate only a handful of the most predictive markers to make this analysis more cost-effective and accessible. Ultimately, I can only dream that my team’s work could help people survive cancer and live longer.

This interview has been condensed and edited for clarity.

Reference

1. Osipov A, et al. The Molecular Twin artificial-intelligence platform integrates multi-omic data to predict outcomes for pancreatic adenocarcinoma patients. Nat Cancer. 2024;5(2):299-314.


Leave a Reply

Your email address will not be published. Required fields are marked *