How does AI help us fight cancer?
This week we’re diving into the computational war on cancer, where AI helps address the challenge of knowledge fragmentation. A Google Scholar search for cancer-related articles in 2018 alone returns 143,000 results, or roughly 2,750 research documents published every week. Now there’s an AI system that can read and understand them. The solution, developed as part of a DARPA program called Big Mechanism, uses natural language processing (NLP) and big data analytics to extract knowledge from this enormous volume of scholarly papers. Through the program, a Leidos-led team of data scientists demonstrated that applying AI to this literature can help fill knowledge gaps, generate new hypotheses, understand important causes and effects, and develop more targeted treatments. To learn more, we welcome Ron Keesing, Director of AI and Machine Learning at Leidos, who explains how AI gives us greater potential to help people succeed in their battle against this horrible disease.
Q: What makes the cancer domain an ideal place to test new AI capabilities?
Ron: Obviously, cancer is a really important problem. It’s also a very rich domain because there are literally thousands of cancer-related papers published every week. There’s an enormous amount of human knowledge and understanding about the disease—far more than any individual human can read and understand. So it was an ideal place to test the hypothesis that maybe machines can understand large, complex causal mechanisms even better than humans by looking across all of the knowledge that’s available.
, Director of AI and Machine Learning
There are a lot of ways that we can find correlations in data sets, but actually understanding causation is a big challenge. That's where AI comes in.
Q: What was the most important goal of the program?
Ron: “Big Mech” was about understanding causal mechanisms that underlie really complex phenomena like cancer. There are a lot of ways that we can find correlations in data sets, but actually understanding causation is a big challenge. In this program we talk about mechanistic causation, or a physical mechanism that makes one thing drive another thing into a new state. Big Mech was all about combining human understanding of mechanistic causation in science with what we can draw from data to learn even more.
Q: Cancer-related academic papers written by pathologists were your primary data sources. What’s in these papers that is so valuable?
Ron: They describe research findings in human terms. Most papers describe individual experiments and report specific phenomena: for example, one protein causing another protein to become altered. There are also papers that describe models of how more complex systems work. Typically, those types of papers are written in review journals by teams of scientists. They might describe a complete model of, let's say, a cancer pathway with multiple proteins that drive a signal through that pathway. These are really important because they typically represent a lot of humans coming together and thinking about how a whole system works. A big part of Big Mech was not only to read individual facts in the cancer literature, but also connect them up to these models that humans have built about cancer. Doing this allows those models to grow and expand as quickly as the rate of new scientific discovery.
Q: What subsets of AI did you find most effective, and how did they work on the program?
Ron: We used NLP to find descriptions of things like causation. This worked by the AI actually reading human descriptions of new findings in the literature, and connecting them up to models that have already been built out and agreed upon by humans. Because these models are so large and complex, these subtle connections are easy for humans to miss. There might be an obscure journal where a researcher shows that a key protein in a pathway causes another protein to increase in concentration in a very specific situation. That's an important piece of knowledge, and we can use NLP to extract it from the literature.
Another important part of this is extracting information represented in diagrams and tables. Humans communicate not only in direct statements, but also in organized forms like diagrams and tables that are very rich in information. A lot of times what’s described in the text is the most exciting positive information that was found, but often what’s not described is the negative information—the things that didn’t work. It turns out that the negative information is really important for machine understanding of the literature. Machines are great at finding things that might be true. Negative information helps prune down machine-generated hypotheses to the things that are far more likely to be true.
Q: What impact do you hope this program will have in the ongoing war on cancer?
Ron: In cancer biology, when there’s a difficult individual case at a hospital, they often form something called a molecular tumor board to come up with a plan for how to treat that patient. The technology developed in Big Mech has already been used by these boards in some cases to help recommend individual treatments based on connections found by AI that no human had ever found before.
In addition to assisting individual treatments, this technology is also helping us discover new potential drug targets and drug designs. One of the big challenges across the cancer domain is finding the right places in the pathway that drugs can target. We understand some of those targets, which receive billions of dollars of research in the pharmaceutical industry. Big Mechanism research was able to identify promising new targets to go after so we can design new drugs for them as well. That’s especially important in rarer forms of cancer that haven’t received as much research attention.