The future of treating lung cancer may lie in tapping deep learning tools.
That’s one takeaway from a recently published study from Penn State University in which researchers developed a deep learning tool that was more than 71 percent accurate in predicting survival expectancy of lung cancer patients.
“This is a high-performance system that is highly accurate and is aimed at helping doctors make these important decisions about providing care to their patients,” said Youakim Badr, associate professor of data analytics at Penn State Great Valley. “Of course, this tool can’t be used as a substitute for a doctor in making decisions on lung cancer treatments.”
Those decisions, however, can be better informed when it comes to determining which medicines should be used and resources tapped in order to affect lung cancer survival periods.
According to Badr and his team, deep learning techniques are uniquely suited to address lung cancer prognosis because the technology can provide the comprehensive analysis needed in cancer research by incorporating factors such as types of cancer, size of tumors, speed of tumor growth, and demographic data.
“Deep learning is a machine-learning algorithm that makes associations between the data, itself, and the labels that we use to describe the data examples,” explained Badr. “By making these associations, it learns from the data.”
The structure of deep learning also offers several advantages for many data science tasks, especially in cases involving large datasets.
“It improves performance tremendously,” added Robin G. Qiu, professor of information science and engineering and an affiliate of the Institute for Computational and Data Sciences. “In traditional machine learning, you have a simple structure of layers of neural networks. In each layer, you have a group of cells. In deep learning, there are many layers of these cells that can be architected into a sophisticated structure to perform better feature transformation and extraction, which gives you the ability to further improve the accuracy of any model.”
For the study, researchers analyzed data from the Surveillance, Epidemiology, and End Results (SEER) program, one of the biggest and most comprehensive databases on the early diagnosis information for cancer patients in the US. The program’s cancer registries cover almost 35 percent of US cancer patients.
“One of the really good things about this data is that it covers a large section of the population and it’s really diverse,” said Shreyesh Doppalapudi, a graduate-student research assistant and first author of the paper. “Another good thing is that it covers a lot of different features, which you can use for many different purposes. This becomes very valuable, especially when using machine learning approaches.”
Going forward, researchers will aim to improve the model and test its ability to analyze other types of cancers and medical conditions.